CN107203511B - Network text named entity identification method based on neural network probability disambiguation - Google Patents

Network text named entity identification method based on neural network probability disambiguation Download PDF

Info

Publication number
CN107203511B
CN107203511B CN201710390409.2A CN201710390409A CN107203511B CN 107203511 B CN107203511 B CN 107203511B CN 201710390409 A CN201710390409 A CN 201710390409A CN 107203511 B CN107203511 B CN 107203511B
Authority
CN
China
Prior art keywords
neural network
word
named entity
network
words
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710390409.2A
Other languages
Chinese (zh)
Other versions
CN107203511A (en
Inventor
周勇
刘兵
韩兆宇
王重秋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China University of Mining and Technology CUMT
Original Assignee
China University of Mining and Technology CUMT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China University of Mining and Technology CUMT filed Critical China University of Mining and Technology CUMT
Priority to CN201710390409.2A priority Critical patent/CN107203511B/en
Priority to RU2019117529A priority patent/RU2722571C1/en
Priority to AU2017416649A priority patent/AU2017416649A1/en
Priority to PCT/CN2017/089135 priority patent/WO2018218705A1/en
Priority to CA3039280A priority patent/CA3039280C/en
Publication of CN107203511A publication Critical patent/CN107203511A/en
Application granted granted Critical
Publication of CN107203511B publication Critical patent/CN107203511B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Machine Translation (AREA)
  • Character Discrimination (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a network text named entity recognition method based on neural network probability disambiguation, which comprises the steps of segmenting words of a non-tag corpus, extracting Word vectors by using Word2Vec, converting a sample corpus into Word feature matrixes and windowing, constructing a deep neural network for training, adding a softmax function into an output layer of the neural network for normalization processing, and obtaining a probability matrix of each Word corresponding to a named entity category; and (4) re-windowing the probability matrix, and disambiguating by using a conditional random field model to obtain the final named entity label. The invention provides a word vector increment learning method without changing a neural network structure according to the characteristics of network words and new words, and adopts a probability disambiguation method for solving the problems of non-standard grammatical structure and many wrongly written words in a network text. Therefore, the method of the invention can generate higher accuracy in the task of identifying the network text naming entity.

Description

Network text named entity identification method based on neural network probability disambiguation
Technical Field
The invention relates to processing and analysis of web texts, in particular to a method for identifying a web text named entity based on neural network probability disambiguation.
Background
The network enables the speed and the scale of information acquisition and transmission to reach unprecedented levels, realizes global information sharing and interaction, and becomes an indispensable infrastructure of the information society. Modern communication and propagation technology greatly improves the speed and the breadth of information propagation. However, the problems and "side effects" associated with this are: the rough information sometimes makes it difficult to get the information needed most quickly and accurately from information oceans such as the sea. How to analyze named entities such as people, places, organizations and the like concerned by internet users from massive network texts provides important support information for various upper-layer applications such as online marketing, group emotion analysis and the like. This makes named entity recognition oriented to web text an important core technology in network data processing and analysis.
Methods of dealing with named entity recognition research has been largely divided into two categories, rule-based methods (rule-based) and statistical-based methods (static-based). With the continuous perfection of the machine learning theory and the great improvement of the calculation performance, the method based on statistics is more favored by people.
At present, a statistical model method for named entity recognition application mainly includes: hidden markov models, decision trees, maximum entropy models, support vector machines, conditional random fields, and artificial neural networks. The named entity recognition of the artificial neural network can obtain better results than that of a conditional random field model, a maximum entropy model and other models, but the practical use still mainly adopts the conditional random field model and the maximum entropy model, for example, a named entity recognition method and a named entity recognition device for microblog texts are provided by using the conditional random field in patent number CN201310182978.X and combining a named entity library, and a named entity recognition method using the maximum entropy model for modeling by using word features is provided in patent number CN200710098635. X. The reason why the artificial neural network is difficult to be used is that the artificial neural network often needs to convert words into vectors in a word vector space in the field of named entity recognition, and therefore, corresponding vectors cannot be obtained for new words, and large-scale practical application cannot be obtained.
Based on the above current situation, the named entity identification for web text mainly has the following problems: first, because there are a lot of network words, new words, and wrongly written words, the network text cannot train a word vector space containing all words to train a neural network. Secondly, the named entity recognition accuracy rate of the network text is reduced due to the phenomena of any language form, irregular grammar structure, many wrongly written characters and the like.
Disclosure of Invention
The purpose of the invention is as follows: in order to overcome the defects in the prior art, the invention provides the network text named entity recognition method based on the neural network probability disambiguation, which can extract the word characteristics in an increment mode without retraining the neural network and simultaneously recognize the probability disambiguation.
The technical scheme is as follows: in order to achieve the purpose, the invention adopts the technical scheme that:
a network text named entity recognition method based on neural network probability disambiguation is characterized in that unlabeled corpora are participled, Word vectors are extracted through Word2Vec, sample corpora are converted into Word feature matrixes and windowed, a deep neural network is constructed for training, a softmax function is added into an output layer of the neural network for normalization processing, and a probability matrix of a named entity category corresponding to each Word is obtained. And (4) re-windowing the probability matrix, and disambiguating by using a conditional random field model to obtain the final named entity label.
The method specifically comprises the following steps:
step 1, obtaining a non-tag corpus through a webpage crawler, obtaining a sample corpus labeled with a named entity from a corpus, and segmenting the non-tag corpus by using a natural language tool.
And 2, training a Word vector space for the unlabeled corpus and the sample corpus which are well participled through a Word2Vec tool.
And 3, converting the text in the sample corpus into Word vectors representing Word characteristics according to the trained Word2Vec model, windowing the Word vectors, and taking a two-dimensional matrix obtained by multiplying the window w by the length d of the Word vectors as the input of the neural network. And converting the labels in the sample corpus into a one-hot form to be used as the output of the neural network. The output layer of the neural network is normalized by adopting a softmax function, so that the classification result of the neural network is the probability that the vocabulary belongs to the non-named entities and various named entities, the structure, the depth, the number of nodes, the step length, the activation function and the initial value parameter in the neural network are adjusted, and the activation function is selected to train the neural network.
And 4, re-windowing the prediction matrix output by the neural network, taking context prediction information of the words to be labeled as correlation points of actual classification of the words to be labeled in the conditional random field model, calculating expected values of all sides by using an EM (effective noise) algorithm according to the training corpus, and training the corresponding conditional random field model.
And 5, during recognition, firstly converting the text to be recognized into Word vectors representing Word characteristics according to the trained Word2Vec model, if the Word2Vec model does not contain corresponding training words, converting the words into the Word vectors by adopting a method of incremental learning, obtaining the Word vectors and backtracking Word vector space, windowing the Word vectors, and taking a two-dimensional matrix formed by multiplying a window w by the length d of the Word vectors as the input of the neural network. And then, the prediction matrix obtained by the neural network is placed into the trained conditional random field model again in a windowed manner for disambiguation, and the final named entity label in the text to be recognized is obtained.
Preferably: the parameters of the Word2Vec tool are as follows: selecting the word vector length 200, performing 25 times of iteration, performing initial step size 0.025 and minimum step size 0.0001, and selecting a CBOW model.
Preferably: the parameters of the neural network are as follows: hiding the layer 2, the number of hidden nodes is 150, the step size is 0.01, the batch size is selected 40, and the sigmoid function is used as the activation function.
Preferably, the method for converting the tags in the sample corpus into the one-hot form is that the tags of the "/o", "/n", "/p" in the sample corpus are correspondingly converted into the named entity tags "/Org-B", "/Org-I", "/Per-B", "/Per-I", "/L oc-B", "/L oc-I", and then converted into the one-hot form.
Preferably: the window size for word vector windowing is 5.
Preferably: when the neural network is trained, one tenth of words are extracted from the sample data and do not participate in the training of the neural network, and the words are used as the measuring standard of the neural network.
Compared with the prior art, the invention has the following beneficial effects:
the word vectors without retraining the neural network can be extracted in an incremental mode, and the neural network is used for predicting and disambiguating by using the probability model, so that the method has better practicability, accuracy and accuracy in the named entity recognition of the web text. In the named entity recognition task of the web text, the invention provides a word vector increment learning method without changing a neural network structure according to the characteristics of the existing web words and the new words, and adopts a probability disambiguation method for solving the problems of non-standard grammatical structure and more wrongly written words in the web text. Therefore, the method of the invention can generate higher accuracy in the task of identifying the network text naming entity.
Drawings
FIG. 1 is a flow diagram for training a device for network-text-named-entity recognition based on neural-network probability disambiguation in accordance with the present invention.
Fig. 2 is a flow chart for converting words into word features according to the present invention.
FIG. 3 is a schematic diagram of text processing and neural network architecture in accordance with the present invention.
Detailed Description
The present invention is further illustrated by the following description in conjunction with the accompanying drawings and the specific embodiments, it is to be understood that these examples are given solely for the purpose of illustration and are not intended as a definition of the limits of the invention, since various equivalent modifications will occur to those skilled in the art upon reading the present invention and fall within the limits of the appended claims.
A network text named entity recognition method based on neural network probability disambiguation is characterized in that unlabeled corpora are participled, Word vectors are extracted through Word2Vec, sample corpora are converted into Word feature matrixes and windowed, a deep neural network is constructed for training, a softmax function is added into an output layer of the neural network for normalization processing, and a probability matrix of a named entity category corresponding to each Word is obtained. And (4) re-windowing the probability matrix, and disambiguating by using a conditional random field model to obtain the final named entity label.
The method specifically comprises the following steps:
step 1, through a web crawler non-tag web text, a corpus with named entity labels is downloaded from each corpus to serve as a sample corpus, and a natural language tool is used for word segmentation of the non-tag corpus.
And 2, training a Word vector space for the unlabeled corpus and the sample corpus which are well participled through a Word2Vec tool.
And 3, converting the text in the sample corpus into a Word vector representing Word characteristics according to the trained Word2Vec model, and taking the Word vector as the input of the neural network. And (3) converting the labels in the sample corpus into a one-hot form as the output of the neural network, wherein in a text processing task, one named entity can be divided into a plurality of vocabularies, so that in order to ensure the completeness of the identified named entity, the labeling form is labeled by adopting an IOB mode.
The method is characterized in that the vocabulary is named after the word is input, and the word is input into the neural network, namely the word and the characteristic information of the fixed-length context of the word are used as the input of the neural network when the vocabulary is judged, and the input of the neural network is not the length d of a word characteristic vector but a two-dimensional matrix of the window w multiplied by the word characteristic length d.
The output layer of the neural network is normalized by adopting a softmax function, so that the classification result of the neural network is the probability that the vocabulary belongs to the non-named entities and various named entities. Adjusting the structure, depth, node number, step length, activation function, initial value parameter and selecting activation function in the neural network to train the neural network.
And 4, re-windowing the prediction matrix output by the neural network, taking context prediction information of the words to be labeled as correlation points of actual classification of the words to be labeled in the conditional random field model, calculating expected values of all sides by using an EM (effective noise) algorithm according to the training corpus, and training the corresponding conditional random field model.
And 5, during recognition, firstly converting the text to be recognized into Word vectors representing Word characteristics according to the trained Word2Vec model, and if the Word2Vec model does not contain corresponding training words, converting the words into the Word vectors by adopting a method of incremental learning, obtaining the Word vectors and backtracking a Word vector space.
(1) And matching the vocabulary to be converted in the trained word vector space.
(2) If the vocabulary to be converted can be matched in the word vector space, the vocabulary is directly converted into the corresponding word vector.
(3) If the Word2Vec model does not contain the corresponding vocabulary, the Word vector space is backed up, the reduction of the precision of the neural network model caused by the Word space offset generated by incremental learning is prevented, the Word2Vec model is loaded, the sentence where the unmatched vocabulary is located is obtained and is put into the Word2Vec model for incremental training, the Word vector of the vocabulary is obtained, and the model is backtracked by utilizing the backed Word vector space.
Windowing the word vector, and taking a two-dimensional matrix of the window w multiplied by the length d of the word vector as the input of the neural network. And then, the prediction matrix obtained by the neural network is placed into the trained conditional random field model again in a windowed manner for disambiguation, and the final named entity label in the text to be recognized is obtained.
Examples of the invention
The method comprises the steps of downloading named entity corpora from a website crawler web text for dog searching news as sample corpora from a data room corpus, utilizing a natural language tool to perform Word segmentation on the crawler web text, utilizing a generic packet in python to perform Word vector space training through a Word2Vec model on the Word corpus and the sample corpora with the following specific parameters, selecting a Word vector length of 200, iterating for 25 times, an initial step length of 0.025, a minimum step length of 0.0001, and selecting a CBOW model.
The method comprises the steps of converting texts of sample corpora into Word vectors representing Word characteristics according to a trained Word2Vec model, and converting the words into Word vectors by adopting a method of incremental learning, Word vector acquisition and Word vector backtracking if the Word2Vec model does not contain corresponding training words, wherein the Word vectors are used as the characteristics of each Word.
Setting the window size to be 5, namely when the category of the named entity of the current word is considered, taking the word characteristics of the current word and the word characteristics of the front word and the word characteristics of the rear word as the input of a neural network, wherein the input of the neural network is the vector of batchSize 1000, extracting one tenth of words from sample data without participating in the training of the neural network as the measurement standard of the neural network, normalizing the output layer of the neural network by adopting a softmax function, enabling the classification result of the neural network to be the probability that the words belong to non-named entities and various named entities, and temporarily taking the maximum value of the probability as the final classification result. The method comprises the steps of adjusting parameters such as the structure, the depth, the number of nodes, the step size, the activation function and the initial value in the neural network to enable the neural network to obtain good accuracy, wherein the final specific parameters are as follows, the number of hidden nodes is 150 in the hidden layer 2, the step size is 0.01, the batch size is selected to be 40, the activation function can generate a good classification effect when using sigmoid, the accuracy can reach 99.83%, and the F values of most representative names of people, places and mechanisms can reach 93.4%, 84.2% and 80.4%.
Removing the step of taking the maximum probability value of the prediction matrix output by the neural network as the final classification result, directly re-windowing the probability matrix, taking context prediction information of the word to be labeled as the correlation point of the actual classification of the word to be labeled in the conditional random field model, calculating the expected value of each side of the conditional random field by using an EM (effective electromagnetic field) algorithm according to the training corpus, training a corresponding conditional random field model, and enabling the F values of the name of the person, the place and the organization to be increased to 94.8%, 85.0% and 82.0% after disambiguation by using the conditional random field.
It can be seen from the above specific embodiments that, compared with the conventional supervised named entity recognition method, the text named entity recognition method based on neural network probability disambiguation provided by the invention uses a word vector conversion method capable of incrementally extracting word features without generating word vector space offset, so that the neural network can be applied to network texts with many new words and wrongly-written or mispronounced words. Moreover, the probability matrix output by the neural network is windowed again, and the conditional random field model is adopted for context disambiguation, so that the phenomena of more wrongly written characters and irregular grammar in the web text can be better solved.
The above description is only of the preferred embodiments of the present invention, and it should be noted that: it will be apparent to those skilled in the art that various modifications and adaptations can be made without departing from the principles of the invention and these are intended to be within the scope of the invention.

Claims (6)

1. A network text named entity identification method based on neural network probability disambiguation is characterized in that: segmenting words of the unlabeled corpus, extracting Word vectors by using Word2Vec, converting the sample corpus into Word feature matrixes, windowing, constructing a deep neural network for training, adding a softmax function into an output layer of the neural network for normalization processing, and obtaining a probability matrix of the named entity category corresponding to each Word; the probability matrix is re-windowed, and disambiguation is performed by using a conditional random field model to obtain a final named entity label, which comprises the following steps:
step 1, obtaining a non-tag corpus through a webpage crawler, obtaining a sample corpus labeled with a named entity from a corpus, and segmenting the non-tag corpus by using a natural language tool;
step 2, training Word vector space of the unlabeled corpus and the sample corpus which are well participled through a Word2Vec tool;
step 3, converting the text in the sample corpus into Word vectors representing Word characteristics according to the trained Word2Vec model, windowing the Word vectors, and taking a two-dimensional matrix obtained by multiplying a window w by the length d of the Word vectors as the input of a neural network; converting the labels in the sample corpus into a one-hot form to be used as the output of the neural network; the output layer of the neural network is normalized by adopting a softmax function, so that the classification result of the neural network is the probability that the vocabulary belongs to the non-named entities and various named entities, the structure, the depth, the number of nodes, the step length, the activation function and the initial value parameter in the neural network are adjusted, and the activation function is selected to train the neural network;
step 4, the prediction matrix output by the neural network is windowed again, context prediction information of the words to be labeled is used as correlation points of actual classification of the words to be labeled in the conditional random field model, expected values of all sides are calculated by using an EM algorithm according to training corpora, and a corresponding conditional random field model is trained;
step 5, during recognition, firstly converting a text to be recognized into Word vectors representing Word characteristics according to a trained Word2Vec model, if the Word2Vec model does not contain corresponding words, converting the words into the Word vectors by adopting a method of incremental learning, obtaining the Word vectors and backtracking Word vector space, windowing the Word vectors, and taking a two-dimensional matrix of a window w multiplied by the length d of the Word vectors as the input of a neural network; and then, the prediction matrix obtained by the neural network is placed into the trained conditional random field model again in a windowed manner for disambiguation, and the final named entity label in the text to be recognized is obtained.
2. The network text named entity recognition method based on neural network probability disambiguation as claimed in claim 1, characterized in that: the parameters of the Word2Vec tool are as follows: selecting the word vector length 200, performing 25 times of iteration, performing initial step size 0.025 and minimum step size 0.0001, and selecting a CBOW model.
3. The network text named entity recognition method based on neural network probability disambiguation as claimed in claim 1, characterized in that: the parameters of the neural network are as follows: hiding the layer 2, the number of hidden nodes is 150, the step size is 0.01, the batch size is selected 40, and the sigmoid function is used as the activation function.
4. The network text named entity recognition method based on neural network probability disambiguation as claimed in claim 1, wherein the method for converting the tags in the sample corpus into one-hot format comprises converting the "/o", "/n", "/p" tags in the sample corpus into named entity tags "/Org-B", "/Org-I", "/Per-B", "/Per-I", "/L oc-B", "/L oc-I", and converting them into one-hot format.
5. The network text named entity recognition method based on neural network probability disambiguation as claimed in claim 1, characterized in that: the window size for word vector windowing is 5.
6. The network text named entity recognition method based on neural network probability disambiguation as claimed in claim 1, characterized in that: when the neural network is trained, one tenth of words are extracted from the sample data and do not participate in the training of the neural network, and the words are used as the measuring standard of the neural network.
CN201710390409.2A 2017-05-27 2017-05-27 Network text named entity identification method based on neural network probability disambiguation Active CN107203511B (en)

Priority Applications (5)

Application Number Priority Date Filing Date Title
CN201710390409.2A CN107203511B (en) 2017-05-27 2017-05-27 Network text named entity identification method based on neural network probability disambiguation
RU2019117529A RU2722571C1 (en) 2017-05-27 2017-06-20 Method of recognizing named entities in network text based on elimination of probability ambiguity in neural network
AU2017416649A AU2017416649A1 (en) 2017-05-27 2017-06-20 Method for recognizing network text named entity based on neural network probability disambiguation
PCT/CN2017/089135 WO2018218705A1 (en) 2017-05-27 2017-06-20 Method for recognizing network text named entity based on neural network probability disambiguation
CA3039280A CA3039280C (en) 2017-05-27 2017-06-20 Method for recognizing network text named entity based on neural network probability disambiguation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710390409.2A CN107203511B (en) 2017-05-27 2017-05-27 Network text named entity identification method based on neural network probability disambiguation

Publications (2)

Publication Number Publication Date
CN107203511A CN107203511A (en) 2017-09-26
CN107203511B true CN107203511B (en) 2020-07-17

Family

ID=59905476

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710390409.2A Active CN107203511B (en) 2017-05-27 2017-05-27 Network text named entity identification method based on neural network probability disambiguation

Country Status (5)

Country Link
CN (1) CN107203511B (en)
AU (1) AU2017416649A1 (en)
CA (1) CA3039280C (en)
RU (1) RU2722571C1 (en)
WO (1) WO2018218705A1 (en)

Families Citing this family (63)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107203511B (en) * 2017-05-27 2020-07-17 中国矿业大学 Network text named entity identification method based on neural network probability disambiguation
CN107665252B (en) * 2017-09-27 2020-08-25 深圳证券信息有限公司 Method and device for creating knowledge graph
CN107832289A (en) * 2017-10-12 2018-03-23 北京知道未来信息技术有限公司 A kind of name entity recognition method based on LSTM CNN
CN107908614A (en) * 2017-10-12 2018-04-13 北京知道未来信息技术有限公司 A kind of name entity recognition method based on Bi LSTM
CN107885721A (en) * 2017-10-12 2018-04-06 北京知道未来信息技术有限公司 A kind of name entity recognition method based on LSTM
CN107967251A (en) * 2017-10-12 2018-04-27 北京知道未来信息技术有限公司 A kind of name entity recognition method based on Bi-LSTM-CNN
CN107797989A (en) * 2017-10-16 2018-03-13 平安科技(深圳)有限公司 Enterprise name recognition methods, electronic equipment and computer-readable recording medium
CN107943788B (en) * 2017-11-17 2021-04-06 平安科技(深圳)有限公司 Enterprise abbreviation generation method and device and storage medium
CN110019648B (en) * 2017-12-05 2021-02-02 深圳市腾讯计算机系统有限公司 Method and device for training data and storage medium
CN108121702B (en) * 2017-12-26 2020-11-24 浙江讯飞智能科技有限公司 Method and system for evaluating and reading mathematical subjective questions
CN108052504B (en) * 2017-12-26 2020-11-20 浙江讯飞智能科技有限公司 Structure analysis method and system for mathematic subjective question answer result
CN108280062A (en) * 2018-01-19 2018-07-13 北京邮电大学 Entity based on deep learning and entity-relationship recognition method and device
CN108563626B (en) * 2018-01-22 2022-01-25 北京颐圣智能科技有限公司 Medical text named entity recognition method and device
CN108388559B (en) * 2018-02-26 2021-11-19 中译语通科技股份有限公司 Named entity identification method and system under geographic space application and computer program
CN108763192B (en) * 2018-04-18 2022-04-19 达而观信息科技(上海)有限公司 Entity relation extraction method and device for text processing
CN108805196B (en) * 2018-06-05 2022-02-18 西安交通大学 Automatic incremental learning method for image recognition
RU2699687C1 (en) * 2018-06-18 2019-09-09 Общество с ограниченной ответственностью "Аби Продакшн" Detecting text fields using neural networks
CN109062983A (en) * 2018-07-02 2018-12-21 北京妙医佳信息技术有限公司 Name entity recognition method and system for medical health knowledge mapping
CN109241520B (en) * 2018-07-18 2023-05-23 五邑大学 Sentence trunk analysis method and system based on multi-layer error feedback neural network for word segmentation and named entity recognition
CN109255119B (en) * 2018-07-18 2023-04-25 五邑大学 Sentence trunk analysis method and system of multi-task deep neural network based on word segmentation and named entity recognition
CN109299458B (en) * 2018-09-12 2023-03-28 广州多益网络股份有限公司 Entity identification method, device, equipment and storage medium
CN109446514A (en) * 2018-09-18 2019-03-08 平安科技(深圳)有限公司 Construction method, device and the computer equipment of news property identification model
CN109657238B (en) * 2018-12-10 2023-10-13 宁波深擎信息科技有限公司 Knowledge graph-based context identification completion method, system, terminal and medium
CN109710927B (en) * 2018-12-12 2022-12-20 东软集团股份有限公司 Named entity identification method and device, readable storage medium and electronic equipment
CN109670177A (en) * 2018-12-20 2019-04-23 翼健(上海)信息科技有限公司 One kind realizing the semantic normalized control method of medicine and control device based on LSTM
CN109858025B (en) * 2019-01-07 2023-06-13 鼎富智能科技有限公司 Word segmentation method and system for address standardized corpus
CN109767817B (en) * 2019-01-16 2023-05-30 南通大学 Drug potential adverse reaction discovery method based on neural network language model
CN111563380A (en) * 2019-01-25 2020-08-21 浙江大学 Named entity identification method and device
CN109800437B (en) * 2019-01-31 2023-11-14 北京工业大学 Named entity recognition method based on feature fusion
CN109992629B (en) * 2019-02-28 2021-08-06 中国科学院计算技术研究所 Neural network relation extraction method and system fusing entity type constraints
CN109858041B (en) * 2019-03-07 2023-02-17 北京百分点科技集团股份有限公司 Named entity recognition method combining semi-supervised learning with user-defined dictionary
CN109933801B (en) * 2019-03-25 2022-03-29 北京理工大学 Bidirectional LSTM named entity identification method based on predicted position attention
CN111858838A (en) * 2019-04-04 2020-10-30 拉扎斯网络科技(上海)有限公司 Menu calibration method and device, electronic equipment and nonvolatile storage medium
CN110083778A (en) * 2019-04-08 2019-08-02 清华大学 The figure convolutional neural networks construction method and device of study separation characterization
CN110245242B (en) * 2019-06-20 2022-01-18 北京百度网讯科技有限公司 Medical knowledge graph construction method and device and terminal
CN110298043B (en) * 2019-07-03 2023-04-07 吉林大学 Vehicle named entity identification method and system
CN110750992B (en) * 2019-10-09 2023-07-04 吉林大学 Named entity recognition method, named entity recognition device, electronic equipment and named entity recognition medium
CN110781646B (en) * 2019-10-15 2023-08-22 泰康保险集团股份有限公司 Name standardization method, device, medium and electronic equipment
CN111008271B (en) * 2019-11-20 2022-06-24 佰聆数据股份有限公司 Neural network-based key information extraction method and system
CN110993081B (en) * 2019-12-03 2023-08-11 济南大学 Doctor online recommendation method and system
CN111091003B (en) * 2019-12-05 2023-10-10 电子科技大学广东电子信息工程研究院 Parallel extraction method based on knowledge graph query
CN111209748B (en) * 2019-12-16 2023-10-24 合肥讯飞数码科技有限公司 Error word recognition method, related device and readable storage medium
CN113139382A (en) * 2020-01-20 2021-07-20 北京国双科技有限公司 Named entity identification method and device
CN111368545B (en) * 2020-02-28 2024-04-30 北京明略软件系统有限公司 Named entity recognition method and device based on multitask learning
CN111477320B (en) * 2020-03-11 2023-05-30 北京大学第三医院(北京大学第三临床医学院) Treatment effect prediction model construction system, treatment effect prediction system and terminal
CN111523323B (en) * 2020-04-26 2022-08-12 梁华智能科技(上海)有限公司 Disambiguation processing method and system for Chinese word segmentation
CN111581957B (en) * 2020-05-06 2022-04-12 浙江大学 Nested entity detection method based on pyramid hierarchical network
CN111476022B (en) * 2020-05-15 2023-07-07 湖南工商大学 Character embedding and mixed LSTM entity identification method, system and medium for entity characteristics
CN111859937A (en) * 2020-07-20 2020-10-30 上海汽车集团股份有限公司 Entity identification method and device
RU2760637C1 (en) * 2020-08-31 2021-11-29 Публичное Акционерное Общество "Сбербанк России" (Пао Сбербанк) Method and system for retrieving named entities
CN112101041B (en) * 2020-09-08 2022-02-15 平安科技(深圳)有限公司 Entity relationship extraction method, device, equipment and medium based on semantic similarity
CN112765983A (en) * 2020-12-14 2021-05-07 四川长虹电器股份有限公司 Entity disambiguation method based on neural network combined with knowledge description
CN112487816B (en) * 2020-12-14 2024-02-13 安徽大学 Named entity identification method based on network classification
CN112905742B (en) * 2021-02-20 2022-07-29 厦门吉比特网络技术股份有限公司 Method and device for recognizing new vocabulary based on semantic model neural network
CN113343690B (en) * 2021-06-22 2024-03-12 北京语言大学 Text readability automatic evaluation method and device
CN114218924A (en) * 2021-07-27 2022-03-22 广东电力信息科技有限公司 Text intention and entity combined identification method based on BERT model
CN113849597B (en) * 2021-08-31 2024-04-30 艾迪恩(山东)科技有限公司 Illegal advertisement word detection method based on named entity recognition
CN114036948B (en) * 2021-10-26 2024-05-31 天津大学 Named entity identification method based on uncertainty quantification
CN114048749B (en) * 2021-11-19 2024-02-02 北京第一因科技有限公司 Chinese named entity recognition method suitable for multiple fields
CN114510943B (en) * 2022-02-18 2024-05-28 北京大学 Incremental named entity recognition method based on pseudo sample replay
WO2023204724A1 (en) * 2022-04-20 2023-10-26 Общество С Ограниченной Ответственностью "Дентонс Юроп" (Ооо "Дентонс Юроп") Method for analyzing a legal document
CN115587594B (en) * 2022-09-20 2023-06-30 广东财经大学 Unstructured text data extraction model training method and system for network security
CN115905456B (en) * 2023-01-06 2023-06-02 浪潮电子信息产业股份有限公司 Data identification method, system, equipment and computer readable storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103455581A (en) * 2013-08-26 2013-12-18 北京理工大学 Mass short message information filtering method based on semantic extension
CN105740349A (en) * 2016-01-25 2016-07-06 重庆邮电大学 Sentiment classification method capable of combining Doc2vce with convolutional neural network
CN105868184A (en) * 2016-05-10 2016-08-17 大连理工大学 Chinese name recognition method based on recurrent neural network
CN106202032A (en) * 2016-06-24 2016-12-07 广州数说故事信息科技有限公司 A kind of sentiment analysis method towards microblogging short text and system thereof

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7502971B2 (en) * 2005-10-12 2009-03-10 Hewlett-Packard Development Company, L.P. Determining a recurrent problem of a computer resource using signatures
US8583416B2 (en) * 2007-12-27 2013-11-12 Fluential, Llc Robust information extraction from utterances
RU2399959C2 (en) * 2008-10-29 2010-09-20 Закрытое акционерное общество "Авикомп Сервисез" Method for automatic text processing in natural language through semantic indexation, method for automatic processing collection of texts in natural language through semantic indexation and computer readable media
US8239349B2 (en) * 2010-10-07 2012-08-07 Hewlett-Packard Development Company, L.P. Extracting data
CN105404632B (en) * 2014-09-15 2020-07-31 深港产学研基地 System and method for carrying out serialized annotation on biomedical text based on deep neural network
CN104809176B (en) * 2015-04-13 2018-08-07 中央民族大学 Tibetan language entity relation extraction method
CN106202044A (en) * 2016-07-07 2016-12-07 武汉理工大学 A kind of entity relation extraction method based on deep neural network
CN107203511B (en) * 2017-05-27 2020-07-17 中国矿业大学 Network text named entity identification method based on neural network probability disambiguation

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103455581A (en) * 2013-08-26 2013-12-18 北京理工大学 Mass short message information filtering method based on semantic extension
CN105740349A (en) * 2016-01-25 2016-07-06 重庆邮电大学 Sentiment classification method capable of combining Doc2vce with convolutional neural network
CN105868184A (en) * 2016-05-10 2016-08-17 大连理工大学 Chinese name recognition method based on recurrent neural network
CN106202032A (en) * 2016-06-24 2016-12-07 广州数说故事信息科技有限公司 A kind of sentiment analysis method towards microblogging short text and system thereof

Also Published As

Publication number Publication date
WO2018218705A1 (en) 2018-12-06
CA3039280C (en) 2021-07-20
CN107203511A (en) 2017-09-26
CA3039280A1 (en) 2018-12-06
AU2017416649A1 (en) 2019-05-02
RU2722571C1 (en) 2020-06-01

Similar Documents

Publication Publication Date Title
CN107203511B (en) Network text named entity identification method based on neural network probability disambiguation
CN109597997B (en) Comment entity and aspect-level emotion classification method and device and model training thereof
CN108920460B (en) Training method of multi-task deep learning model for multi-type entity recognition
CN110059188B (en) Chinese emotion analysis method based on bidirectional time convolution network
CN106980683B (en) Blog text abstract generating method based on deep learning
CN107085581B (en) Short text classification method and device
CN113239186B (en) Graph convolution network relation extraction method based on multi-dependency relation representation mechanism
CN109753660B (en) LSTM-based winning bid web page named entity extraction method
CN110765775A (en) Self-adaptive method for named entity recognition field fusing semantics and label differences
CN113255320A (en) Entity relation extraction method and device based on syntax tree and graph attention machine mechanism
CN110046356B (en) Label-embedded microblog text emotion multi-label classification method
CN106682089A (en) RNNs-based method for automatic safety checking of short message
CN112069312B (en) Text classification method based on entity recognition and electronic device
CN110968725B (en) Image content description information generation method, electronic device and storage medium
CN112561718A (en) Case microblog evaluation object emotion tendency analysis method based on BilSTM weight sharing
CN112434514B (en) Multi-granularity multi-channel neural network based semantic matching method and device and computer equipment
CN111159405B (en) Irony detection method based on background knowledge
Wang et al. Mongolian named entity recognition with bidirectional recurrent neural networks
CN115309915A (en) Knowledge graph construction method, device, equipment and storage medium
CN113204975A (en) Sensitive character wind identification method based on remote supervision
Shelke et al. A novel approach for named entity recognition on Hindi language using residual bilstm network
CN115186670B (en) Method and system for identifying domain named entities based on active learning
CN116644148A (en) Keyword recognition method and device, electronic equipment and storage medium
CN115796141A (en) Text data enhancement method and device, electronic equipment and storage medium
CN115577111A (en) Text classification method based on self-attention mechanism

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant