CA3039280C - Method for recognizing network text named entity based on neural network probability disambiguation - Google Patents

Method for recognizing network text named entity based on neural network probability disambiguation Download PDF

Info

Publication number
CA3039280C
CA3039280C CA3039280A CA3039280A CA3039280C CA 3039280 C CA3039280 C CA 3039280C CA 3039280 A CA3039280 A CA 3039280A CA 3039280 A CA3039280 A CA 3039280A CA 3039280 C CA3039280 C CA 3039280C
Authority
CA
Canada
Prior art keywords
neural network
word
named entity
word vector
probability
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CA3039280A
Other languages
French (fr)
Other versions
CA3039280A1 (en
Inventor
Yong Zhou
Bing Liu
Zhaoyu HAN
Zhongqiu WANG
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China University of Mining and Technology CUMT
Original Assignee
China University of Mining and Technology CUMT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China University of Mining and Technology CUMT filed Critical China University of Mining and Technology CUMT
Publication of CA3039280A1 publication Critical patent/CA3039280A1/en
Application granted granted Critical
Publication of CA3039280C publication Critical patent/CA3039280C/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Machine Translation (AREA)
  • Character Discrimination (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A method for recognizing network text named entities based on neural network probability disambiguation comprising: carrying out word segmentation on an unlabeled corpus, using Word2Vec to extract a word vector; converting a sample corpus into a word feature matrix and windowing same; building a deep neural network to carry out training, and adding a softmax function into an output layer of the neural network to carry out normalization processing, so as to obtain a probability matrix of the named entity category corresponding to each word; and re-windowing the probability matrix, and using a conditional random field model to carry out disambiguation, so as to obtain a final named entity annotation. A probability disambiguation method is used in order to deal with the problems of a nonstandard grammatical structure and many wrongly written characters in the network text.

Description

TITLE OF THE INVENTION
METHOD FOR RECOGNIZING NETWORK TEXT NAMED ENTITY BASED ON
NEURAL NETWORK PROBABILITY DISAMBIGUATION
TECHNICAL FIELD
The present invention relates to processing and analysis of network text, particularly to a method for recognizing network text named entities based on neural network probability disambiguation.
BACKGROUND ART
Networks have driven the speed and scale of information collection and dissemination to an unprecedented level and brought global information sharing and interaction into reality, and have become an indispensable infrastructure in the information society. Modern communication and dissemination techniques have greatly improved the speed and breadth of information dissemination. However, there are accompanying problems and "side effects":
sometimes people are confused by the turbulent information, and it is very hard to obtain the precise information needed quickly and accurately from the vast sea of information. It is a prerequisite to analyze and obtain named entities, such as people, places, and organizations, etc., concerned by Internet users from within a mas of network text, in order to provide important support information for various higher-level applications such as online marketing, group emotion analysis, etc. Accordingly, network text named entity recognition has become an important core technique in network data processing and analysis.
Two kinds of methods for dealing with named entity recognition are considered in the research, i.e., rule-based method and statistics-based method. As the machine learning theory is consummated continuously and the computing performance is improved greatly, the statistics-based method is increasingly favored.
At present, statistical models and methods applied in named entity recognition mainly include:
hidden Markov model, decision tree, maximum entropy model, support vector machine, conditional random field and artificial neural network. Artificial neural networks can achieve a better result in named entity recognition than conditional random field, maximum entropy model, and other models, but conditional random field and maximum entropy models are still dominant practical models. For example, in the Patent Document No. CN201310182978.X, named entity recognition method and apparatus for MicroBlog text based on conditional random field and named entity library are proposed. In the Patent Document No.
CN200710098635.X, a named entity recognition method utilizing word features and using maximum entropy model to model is proposed. Artificial neural networks are difficult to use practically because they often require the conversion of words into vectors in a word vector space in the field of named entity recognition.
Consequently, artificial neural networks can not be applied in large-scale practical applications, because they are unable to obtain corresponding vectors for new words.
Owing to the above-mentioned present situation, there are mainly the following problems in named entity recognition for network text: firstly, it is unable to train a word vector space that contains all words in order to train a neural network, because there are a lot of network words, new words, and wrongly written or mispronounced characters in network text;
secondly, the accuracy of named entity recognition for network texts is degraded as a result of phenomena existing in network text, such as arbitrary language forms, non-standard grammatical structures, and wrongly written or mispronounced characters, etc.
SUMMARY OF THE INVENTION
The object of the invention is to overcome the drawbacks in the prior art. The present invention provides a network text named entity recognition method based on neural network probability disambiguation, which extracts word features incrementally without neural network retraining, and performs recognition with the aid of probability disambiguation. The method obtains a prediction probability matrix on the named entity category of a word from a neural network by training neural network, and performs disambiguation on the prediction matrix outputted from the neural network in a probability model, and thereby improves accuracy and precision of network text named entity recognition.
In order to attain the object described above, the technical scheme employed by the present invention is as follows.
The network text named entity recognition method is based on neural network probability disambiguation, performing word segmentation on an untagged corpus, utilizing Word2Vec to extract a word vector, converting sample corpora into a word feature matrix and windowing, building a deep neural network for training, adding a softmax function into an output layer of the neural network, and performing normalization, to acquire a probability matrix of named entity categorycorresponding to each word; re-windowing the probability matrix, and utilizing a conditional random field model for disambiguation to obtain a final named entity tag.
Specifically, the method comprises the following steps:
step 1: acquiring an untagged corpus by means of a web crawler, acquiring sample corpora with named entity tags from a corpus base, and performing word segmentation on the untagged corpus by a natural language tool;
step 2: performing word vector space training on the segmented untagged corpus and the sample corpora by a Word2Vec tool;
step 3: converting the text in the sample corpora into a word vector representing word features according to the trained Word2Vec model, windowing the word vector, and taking a two-dimensional matrix composed by multiplying the window w by the length d of the word vector as an input to a neural network; converting the tags in the sample corpora into a one-hot
2 form and taking them as outputs of the neural network; performing normalization on an output layer of the neural network with a softmax function, so that a categorization result produced by the neural network becomes a probability of whether the word belongs to an unnamed entity or a named entity, adjusting the structure, depth, number of nodes, step length, activation function, and initial value parameters in the neural network, and selecting an activation function to train the neural network;
step 4: re-windowing a prediction matrix outputted from the neural network, taking context prediction information of the word to be tagged as a point of correlation with an actual category of the word to be tagged in a conditional random field model, utilizing an EM
algorithm to calculate expected values at all sides according to training corpora, and training a corresponding conditional random field model;
step 5: in the recognition process, first, converting the text to be recognized into a word vector that represents word features according to the trained Word2Vec model, and, if the Word2Vec model doesn't contain a corresponding training word, converting the word into a word vector by means of incremental learning, word vector acquisition, and word vector space backtracking, etc., windowing the word vector, and taking a two-dimensional matrix composed by multiplying the window w by the length d of the word vector as an input to the neural network;
then, re-windowing a prediction matrix obtained from the neural network, performing disambiguation on the prediction matrix in the trained conditional random field model, and obtaining a final named entity tag of the text to be recognized.
Preferably, the parameters of the Word2Vec tool are as follows: length of word vector: 200, number of iterations: 25, initial step length: 0.025, minimum step length:
0.0001, and a CBOW
model is selected.
Preferably, the parameters of the neural network are as follows: number of hidden layers: 2, number of hidden nodes: 150, step length: 0.01, batchSize: 40, activation function: sigmoid function.
Preferably, the tags in the sample corpora are converted into an one-hot form with the following method: converting the tags "/o", "In", and "/p" in the sample corpora into named entity tags "/Org-B", "Org-I", "/Per-B", "/Per-I", "/Loc-B", and "/Loc-I" correspondingly, and then converting the named entity tags into the one-hot form.
Preferably, the window size for windowing the word vector is 5.
Preferably, in neural network training, one-tenth words are extracted from the sample data and excluded from the neural network training, but are used as evaluation criteria for the neural network.
Compared with the prior art, the present invention targets the following beneficial effects:
Word vectors without retraining the neural network may be extracted incrementally, prediction may be carried out with the neural network, and disambiguation may be performed with a
3 probability model, so that the method achieves better practicability, accuracy and precision in named entity recognition of network text. In the task of named entity recognition of network text, the present invention provides an incremental word vector learning method without changing the structure of a neural network according to a characteristic that network words and new words exist, and employs a probability disambiguation method to deal with the problems that network texts are non-standard in grammatical structure and contain a lot of wrongly written or mispronounced characters. Thus, the method provided in the present invention attains high accuracy in network text named entity recognition tasks.
BRIEF DESCRIPTION OF DRAWINGS
Fig. 1 is a flow chart of training a network text named entity recognition device based on neural network probability disambiguation according to the present invention;
Fig. 2 is a flow chart of converting a word into word features according to the present invention;
Fig. 3 is a schematic diagram of the text processing and neural network architecture according to the present invention.
DETAILED DESCRIPTION OF THE EMBODIMENTS
Hereunder the present invention will be further detailed in embodiments, with reference to the accompanying drawings. It should be appreciated that those embodiments are provided only for describing the present invention, and shall not be deemed as constituting any limitation to the scope of the present invention. After reading the present invention, modifications to the present invention in various equivalent forms made by those skilled in the art shall be deemed as falling into the protected scope as defined by the attached claims in this application.
A network text named entity recognition method based on neural network probability disambiguation, performing word segmentation on an untagged corpus, utilizing Word2Vec to extract a word vector, converting sample corpora into a word feature matrix and windowing, building a deep neural network for training, adding a softmax function into an output layer of the neural network, and performing normalization, to acquire a probability matrix of named entity category corresponding to each word; re-windowing the probability matrix, and utilizing a conditional random field model for disambiguation to obtain a final named entity tag.
Specifically, the method comprises the following steps:
step I: Acquiring untagged network text by means of a web crawler, downloading corpora with named entity tags as sample corpora from a corpus base, and performing word segmentation on the untagged corpus with a natural language tool;
step 2: Performing word vector space training on the segmented untagged corpus and the sample corpora with a Word2Vec tool;
step 3: Converting the text in the sample corpora to a word vector that represents word features
4 according to a trained Word2Vec model, and taking the word vector as an input to a neural network; converting the tags in the sample corpora into an one-hot form and taking them as outputs of the neural network. In view that a named entity may be divided into several words in a text processing task, the tagging is performed in an JOB pattern, in order to ensure that the recognized named entity has integrality.
Which named entity category a word belongs to should not be judged merely on the basis of the word itself, but should be further judged according to the context information of the word.
Therefore, a concept of "window" is introduced in the building of the neural network, i.e., in the judgment of a word, both the word and the characteristic information of content in fixed length thereof are taken as inputs to the neural network; thus, the input to the neural network is no longer the length d of a word feature vector, but is a two-dimensional matrix composed by multiplying the window w by the length d of word feature vector instead.
An output layer of the neural network is normalized with a softmax function, so that a categorization result produced by the neural network becomes a probability of whether the word belongs to an unnamed entity or a named entity. The structure, depth, number of nodes, step length, activation function, initial value parameters in the neural network are adjusted, and an activation function is selected to train the neural network.
step 4: Re-windowing a prediction matrix outputted from the neural network, taking context prediction information of the word to be tagged as a point of correlation with an actual category of the word to be tagged in a conditional random field model, utilizing an EM
algorithm to calculate expected values at all sides according to training corpora, and training a corresponding conditional random field model;
step 5: In the recognition process, first, converting the text to be recognized into a word vector that represents word features according to the trained Word2Vec model, and, if the Word2Vec model doesn't contain a corresponding training word, converting the word into a word vector by means of incremental learning, word vector acquisition, and word vector space backtracking, etc.
(1) matching the word to be converted in a trained word vector space;
(2) converting the word to be converted directly to a corresponding word vector, if the word is matched in the word vector space;
(3) if the Word2Vec model doesn't contain a corresponding word, backing up the word vector space to prevent degradation of the accuracy of the neural network model incurred by deviation of a word space created in incremental learning, loading the Word2Vec model, acquiring a sentence where the mismatched word exists, inputting the sentence into the Word2Vec model and performing increment training, acquiring the word vector of the word, and utilizing the backup word vector space to performing backtracking of the model;
windowing the word vector, and taking a two-dimensional matrix composed by multiplying the window w by the length d of word vector as an input to the neural network;
then, re-windowing a prediction matrix obtained from the neural network, performing disambiguation on the prediction matrix in the trained conditional random field model, and obtaining a final named entity tag of the text to be recognized.
Example Network text is acquired by means of a web crawler from Sogou News website (http://news.sogou.com/), corpora with named entity tags are downloaded from Datatang corpus base (http://www.datatang.com/) as sample corpora, word segmentation is performed on the acquired network text with a natural language tool, word vector space training is performed on the segmented corpus and sample corpora with gensim package in python by Word2Vec model, utilizing the following parameters: length of word vector: 200, number of iterations: 25, initial step length: 0.025, and minimum step length: 0.0001, and a CBOW model is selected.
The text in the sample corpora is converted into a word vector that represents word features according to the trained Word2Vec model, and, if the Word2Vec model doesn't contain a corresponding training word, the word is converted into a word vector by means of incremental learning, word vector acquisition, and word vector space backtracking, etc., as the features of the word. The tags "/o", "In", and "/p" in the sample corpora acquired from Datatang are converted into named entity tags "/Org-B", "/Org-I", "/Per-B", "/Per-I", "/Loc-B", and "/Loc-I", etc.
correspondingly, and then the named entity tags are converted into the one-hot form as outputs of the neural network.
The window size is set to 5, i.e., in the consideration of the named entity category of the current word, the word features of the word and two words before the word and two words after the word are used as inputs to the neural network; the input to the neural network is a batchSize*1000 vector; one-tenth words are extracted from the sample data and excluded from the neural network training, but are used as evaluation criteria for the neural network; the output layer of the neural network is normalized with a softmax function, so that a categorization result produced by the neural network becomes a probability of whether the word belongs to an unnamed entity or named entity; the maximum value of probability is taken as the final categorization result temporarily. The parameters in the neural network, such as structure, depth, number of nodes, step length, activation function, and initial value, etc., are adjusted to ensure the neural network attain high accuracy; the final parameters are as follows:
number of hidden layers: 2, number of hidden nodes: 150, step length: 0.01, batchSize: 40, activation function:
sigmoid; thus, a good categorization effect can be attained, the accuracy may be as high as 99.83%, and the F values of the most representative personal names, place names, and organization names may be 93.4%, 84.2%, and 80.4% respectively.
The step of taking the maximum probability value of the prediction matrix outputted from the neural network as the final categorization result is removed, the probability matrix is re-windowed directly, the context prediction information of the word to be tagged is used as a point of correlation with the actual category of the word to be tagged in a conditional random field model, an EM algorithm is used to calculate expected values at all sides of the conditional random field according to the training corpora, and a corresponding conditional random field model is trained; after disambiguation with the conditional random field, the F values of personal names, place names, and organization names can be improved to 94.8%, 85.0%, and 82.0%
respectively.
It is seen from the embodiment described above: compared with the conventional supervised named entity recognition method, the text named entity recognition method based on neural network probability disambiguation provided in the present invention employs a word vector conversion method that can be used to extract word features incrementally without causing deviation of the word vector space; thus, the neural network can be applied to network text that contains a lot of new words and wrongly written or mispronounced characters.
Moreover, in the present invention, the probability matrix outputted from the neural network is re-windowed, and context disambiguation is performed with a conditional random field model, so as to deal with the phenomenon that the network text involves a lot of wrongly written or mispronounced characters and non-standard grammatical structures successfully.
While the present invention is described above in some preferred embodiments, it should be noted that those skilled in the art can make various improvements and modifications without departing from the principle of the present invention, and those improvements and modifications should be deemed as falling in the scope of protection of the present invention.

Claims (7)

1. A method for recognizing network text named entity based on neural network probability disambiguation, comprising: performing word segmentation on an untagged corpus, utilizing Word2Vec to extract a word vector, converting sample corpora into a word feature matrix, windowing, building a deep neural network for training, adding a softmax function into an output layer of the neural network, and perfonning normalization, to acquire a probability matrix of named entity category corresponding to each word;
re-windowing the probability matrix, and utilizing a conditional random field model for disambiguation to obtain a final named entity tag.
2. The method for recognizing network text named entity based on neural network probability disambiguation according to claim 1, comprising the following steps:
step 1: acquiring the untagged corpus by means of a web crawler, acquiring sample corpora with named entity tags from a corpus base, and perfonning word segmentation on the untagged corpus with a natural language tool;
step 2: performing word vector space training on the segmented untagged corpus and the sample corpora by the Word2Vec tool;
step 3: converting the text in the sample corpora into the word vector representing word features according to the trained Word2Vec model, windowing the word vector, and taking a two-dimensional matrix composed by multiplying the window w by the length d of the word vector as an input to the neural network; converting the tags in the sample corpora into a one-hot fonn and taking them as outputs of the neural network; performing nonnalization on an output layer of the neural network with the softmax function, so that a categorization result produced by the neural network becomes a probability of whether the word belongs to an unnamed entity or a named entity, adjusting the structure, depth, number of nodes, step length, activation function, and initial value parameters in the neural network, and selecting an activation function to train the neural network;
step 4: re-windowing a prediction matrix outputted from the neural network, taking context prediction information of the word to be tagged as a point of correlation with an actual category of the word to be tagged in the conditional random field model, utilizing an expectation-maximization (EM) algorithm to calculate expected values at all sides according to training corpora, and training a corresponding conditional random field model;
step 5: in the recognition process, first, converting the text to be recognized into the word vector that represents word features according to the trained Word2Vec model, and, if the Word2Vec model doesn't contain a corresponding word, converting the word into the word vector by means of incremental learning, word vector acquisition, and word vector space backtracking, windowing the word vector, and Date Recue/Date Received 2020-09-10 taking the two-dimensional matrix composed by multiplying the window w by the length d of the word vector as an input to the neural network; then, re-windowing the prediction matrix obtained from the neural network, performing disambiguation on the prediction matrix in the trained conditional random field model, and obtaining the final named entity tag of the text to be recognized.
3. The method for recognizing network text named entity based on neural network probability disambiguation according to claim 1, wherein, the parameters of the Word2Vec tool are as follows: length of word vector: 200, number of iterations: 25, initial step length: 0.025, minimum step length: 0.0001, and a continuous bag-of-words (CBOW) model is selected.
4. The method for recognizing network text named entity based on neural network probability disambiguation according to claim 1, wherein, the parameters of the neural network are as follows: number of hidden layers: 2, number of hidden nodes:
150, step length: 0.01, batch size: 40, activation function: sigmoid function.
5. The method for recognizing network text named entity based on neural network probability disambiguation according to claim 1, wherein, the tags in the sample corpora are converted into a one-hot form with the following method: converting the tags "/n", and "/p" in the sample corpora into named entity tags "/Org-B", "/Org-I", "/Per-B", "/Per-I", "/Loc-B", and "/Loc-I" correspondingly, and then converting the named entity tags into the one-hot form.
6. The method for recognizing network text named entity based on neural network probability disambiguation according to claim 1, wherein, the window size for windowing the word vector is 5.
7. The method for recognizing network text named entity based on neural network probability disambiguation according to claim 1, wherein, in neural network training, one-tenth of the words are extracted from the sample data and excluded from the neural network training, but are used as evaluation criteria for the neural network.

Date Recue/Date Received 2020-09-10
CA3039280A 2017-05-27 2017-06-20 Method for recognizing network text named entity based on neural network probability disambiguation Active CA3039280C (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
CN201710390409.2A CN107203511B (en) 2017-05-27 2017-05-27 Network text named entity identification method based on neural network probability disambiguation
CN201710390409.2 2017-05-27
PCT/CN2017/089135 WO2018218705A1 (en) 2017-05-27 2017-06-20 Method for recognizing network text named entity based on neural network probability disambiguation

Publications (2)

Publication Number Publication Date
CA3039280A1 CA3039280A1 (en) 2018-12-06
CA3039280C true CA3039280C (en) 2021-07-20

Family

ID=59905476

Family Applications (1)

Application Number Title Priority Date Filing Date
CA3039280A Active CA3039280C (en) 2017-05-27 2017-06-20 Method for recognizing network text named entity based on neural network probability disambiguation

Country Status (5)

Country Link
CN (1) CN107203511B (en)
AU (1) AU2017416649A1 (en)
CA (1) CA3039280C (en)
RU (1) RU2722571C1 (en)
WO (1) WO2018218705A1 (en)

Families Citing this family (62)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107203511B (en) * 2017-05-27 2020-07-17 中国矿业大学 Network text named entity identification method based on neural network probability disambiguation
CN107665252B (en) * 2017-09-27 2020-08-25 深圳证券信息有限公司 Method and device for creating knowledge graph
CN107885721A (en) * 2017-10-12 2018-04-06 北京知道未来信息技术有限公司 A kind of name entity recognition method based on LSTM
CN107832289A (en) * 2017-10-12 2018-03-23 北京知道未来信息技术有限公司 A kind of name entity recognition method based on LSTM CNN
CN107908614A (en) * 2017-10-12 2018-04-13 北京知道未来信息技术有限公司 A kind of name entity recognition method based on Bi LSTM
CN107967251A (en) * 2017-10-12 2018-04-27 北京知道未来信息技术有限公司 A kind of name entity recognition method based on Bi-LSTM-CNN
CN107797989A (en) * 2017-10-16 2018-03-13 平安科技(深圳)有限公司 Enterprise name recognition methods, electronic equipment and computer-readable recording medium
CN107943788B (en) * 2017-11-17 2021-04-06 平安科技(深圳)有限公司 Enterprise abbreviation generation method and device and storage medium
CN110019648B (en) * 2017-12-05 2021-02-02 深圳市腾讯计算机系统有限公司 Method and device for training data and storage medium
CN108052504B (en) * 2017-12-26 2020-11-20 浙江讯飞智能科技有限公司 Structure analysis method and system for mathematic subjective question answer result
CN108121702B (en) * 2017-12-26 2020-11-24 浙江讯飞智能科技有限公司 Method and system for evaluating and reading mathematical subjective questions
CN108280062A (en) * 2018-01-19 2018-07-13 北京邮电大学 Entity based on deep learning and entity-relationship recognition method and device
CN108563626B (en) * 2018-01-22 2022-01-25 北京颐圣智能科技有限公司 Medical text named entity recognition method and device
CN108388559B (en) * 2018-02-26 2021-11-19 中译语通科技股份有限公司 Named entity identification method and system under geographic space application and computer program
CN108763192B (en) * 2018-04-18 2022-04-19 达而观信息科技(上海)有限公司 Entity relation extraction method and device for text processing
CN108805196B (en) * 2018-06-05 2022-02-18 西安交通大学 Automatic incremental learning method for image recognition
RU2699687C1 (en) * 2018-06-18 2019-09-09 Общество с ограниченной ответственностью "Аби Продакшн" Detecting text fields using neural networks
CN109062983A (en) * 2018-07-02 2018-12-21 北京妙医佳信息技术有限公司 Name entity recognition method and system for medical health knowledge mapping
CN109241520B (en) * 2018-07-18 2023-05-23 五邑大学 Sentence trunk analysis method and system based on multi-layer error feedback neural network for word segmentation and named entity recognition
CN109255119B (en) * 2018-07-18 2023-04-25 五邑大学 Sentence trunk analysis method and system of multi-task deep neural network based on word segmentation and named entity recognition
CN109299458B (en) * 2018-09-12 2023-03-28 广州多益网络股份有限公司 Entity identification method, device, equipment and storage medium
CN109446514A (en) * 2018-09-18 2019-03-08 平安科技(深圳)有限公司 Construction method, device and the computer equipment of news property identification model
CN109657238B (en) * 2018-12-10 2023-10-13 宁波深擎信息科技有限公司 Knowledge graph-based context identification completion method, system, terminal and medium
CN109710927B (en) * 2018-12-12 2022-12-20 东软集团股份有限公司 Named entity identification method and device, readable storage medium and electronic equipment
CN109670177A (en) * 2018-12-20 2019-04-23 翼健(上海)信息科技有限公司 One kind realizing the semantic normalized control method of medicine and control device based on LSTM
CN109858025B (en) * 2019-01-07 2023-06-13 鼎富智能科技有限公司 Word segmentation method and system for address standardized corpus
CN109767817B (en) * 2019-01-16 2023-05-30 南通大学 Drug potential adverse reaction discovery method based on neural network language model
CN111563380A (en) * 2019-01-25 2020-08-21 浙江大学 Named entity identification method and device
CN109800437B (en) * 2019-01-31 2023-11-14 北京工业大学 Named entity recognition method based on feature fusion
CN109992629B (en) * 2019-02-28 2021-08-06 中国科学院计算技术研究所 Neural network relation extraction method and system fusing entity type constraints
CN109858041B (en) * 2019-03-07 2023-02-17 北京百分点科技集团股份有限公司 Named entity recognition method combining semi-supervised learning with user-defined dictionary
CN109933801B (en) * 2019-03-25 2022-03-29 北京理工大学 Bidirectional LSTM named entity identification method based on predicted position attention
CN111858838A (en) * 2019-04-04 2020-10-30 拉扎斯网络科技(上海)有限公司 Menu calibration method and device, electronic equipment and nonvolatile storage medium
CN110083778A (en) * 2019-04-08 2019-08-02 清华大学 The figure convolutional neural networks construction method and device of study separation characterization
CN110245242B (en) * 2019-06-20 2022-01-18 北京百度网讯科技有限公司 Medical knowledge graph construction method and device and terminal
CN110298043B (en) * 2019-07-03 2023-04-07 吉林大学 Vehicle named entity identification method and system
CN110750992B (en) * 2019-10-09 2023-07-04 吉林大学 Named entity recognition method, named entity recognition device, electronic equipment and named entity recognition medium
CN110781646B (en) * 2019-10-15 2023-08-22 泰康保险集团股份有限公司 Name standardization method, device, medium and electronic equipment
CN111008271B (en) * 2019-11-20 2022-06-24 佰聆数据股份有限公司 Neural network-based key information extraction method and system
CN110993081B (en) * 2019-12-03 2023-08-11 济南大学 Doctor online recommendation method and system
CN111091003B (en) * 2019-12-05 2023-10-10 电子科技大学广东电子信息工程研究院 Parallel extraction method based on knowledge graph query
CN111209748B (en) * 2019-12-16 2023-10-24 合肥讯飞数码科技有限公司 Error word recognition method, related device and readable storage medium
CN113139382A (en) * 2020-01-20 2021-07-20 北京国双科技有限公司 Named entity identification method and device
CN111368545B (en) * 2020-02-28 2024-04-30 北京明略软件系统有限公司 Named entity recognition method and device based on multitask learning
CN111477320B (en) * 2020-03-11 2023-05-30 北京大学第三医院(北京大学第三临床医学院) Treatment effect prediction model construction system, treatment effect prediction system and terminal
CN111523323B (en) * 2020-04-26 2022-08-12 梁华智能科技(上海)有限公司 Disambiguation processing method and system for Chinese word segmentation
CN111581957B (en) * 2020-05-06 2022-04-12 浙江大学 Nested entity detection method based on pyramid hierarchical network
CN111476022B (en) * 2020-05-15 2023-07-07 湖南工商大学 Character embedding and mixed LSTM entity identification method, system and medium for entity characteristics
CN111859937A (en) * 2020-07-20 2020-10-30 上海汽车集团股份有限公司 Entity identification method and device
CN112199953A (en) * 2020-08-24 2021-01-08 广州九四智能科技有限公司 Method and device for extracting information in telephone conversation and computer equipment
RU2760637C1 (en) * 2020-08-31 2021-11-29 Публичное Акционерное Общество "Сбербанк России" (Пао Сбербанк) Method and system for retrieving named entities
CN112101041B (en) * 2020-09-08 2022-02-15 平安科技(深圳)有限公司 Entity relationship extraction method, device, equipment and medium based on semantic similarity
CN112487816B (en) * 2020-12-14 2024-02-13 安徽大学 Named entity identification method based on network classification
CN112765983A (en) * 2020-12-14 2021-05-07 四川长虹电器股份有限公司 Entity disambiguation method based on neural network combined with knowledge description
CN112905742B (en) * 2021-02-20 2022-07-29 厦门吉比特网络技术股份有限公司 Method and device for recognizing new vocabulary based on semantic model neural network
CN113343690B (en) * 2021-06-22 2024-03-12 北京语言大学 Text readability automatic evaluation method and device
CN113849597B (en) * 2021-08-31 2024-04-30 艾迪恩(山东)科技有限公司 Illegal advertisement word detection method based on named entity recognition
CN114048749B (en) * 2021-11-19 2024-02-02 北京第一因科技有限公司 Chinese named entity recognition method suitable for multiple fields
CN114510943B (en) * 2022-02-18 2024-05-28 北京大学 Incremental named entity recognition method based on pseudo sample replay
WO2023204724A1 (en) * 2022-04-20 2023-10-26 Общество С Ограниченной Ответственностью "Дентонс Юроп" (Ооо "Дентонс Юроп") Method for analyzing a legal document
CN115587594B (en) * 2022-09-20 2023-06-30 广东财经大学 Unstructured text data extraction model training method and system for network security
CN115905456B (en) * 2023-01-06 2023-06-02 浪潮电子信息产业股份有限公司 Data identification method, system, equipment and computer readable storage medium

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7502971B2 (en) * 2005-10-12 2009-03-10 Hewlett-Packard Development Company, L.P. Determining a recurrent problem of a computer resource using signatures
US8583416B2 (en) * 2007-12-27 2013-11-12 Fluential, Llc Robust information extraction from utterances
RU2399959C2 (en) * 2008-10-29 2010-09-20 Закрытое акционерное общество "Авикомп Сервисез" Method for automatic text processing in natural language through semantic indexation, method for automatic processing collection of texts in natural language through semantic indexation and computer readable media
US8239349B2 (en) * 2010-10-07 2012-08-07 Hewlett-Packard Development Company, L.P. Extracting data
CN103455581B (en) * 2013-08-26 2016-05-04 北京理工大学 This information filtering method of Massive short documents based on semantic extension
CN105404632B (en) * 2014-09-15 2020-07-31 深港产学研基地 System and method for carrying out serialized annotation on biomedical text based on deep neural network
CN104809176B (en) * 2015-04-13 2018-08-07 中央民族大学 Tibetan language entity relation extraction method
CN105740349B (en) * 2016-01-25 2019-03-08 重庆邮电大学 A kind of sensibility classification method of combination Doc2vec and convolutional neural networks
CN105868184B (en) * 2016-05-10 2018-06-08 大连理工大学 A kind of Chinese personal name recognition method based on Recognition with Recurrent Neural Network
CN106202032B (en) * 2016-06-24 2018-08-28 广州数说故事信息科技有限公司 A kind of sentiment analysis method and its system towards microblogging short text
CN106202044A (en) * 2016-07-07 2016-12-07 武汉理工大学 A kind of entity relation extraction method based on deep neural network
CN107203511B (en) * 2017-05-27 2020-07-17 中国矿业大学 Network text named entity identification method based on neural network probability disambiguation

Also Published As

Publication number Publication date
AU2017416649A1 (en) 2019-05-02
CA3039280A1 (en) 2018-12-06
CN107203511A (en) 2017-09-26
CN107203511B (en) 2020-07-17
RU2722571C1 (en) 2020-06-01
WO2018218705A1 (en) 2018-12-06

Similar Documents

Publication Publication Date Title
CA3039280C (en) Method for recognizing network text named entity based on neural network probability disambiguation
CN109493977B (en) Text data processing method and device, electronic equipment and computer readable medium
CN110472003B (en) Social network text emotion fine-grained classification method based on graph convolution network
CN110796160A (en) Text classification method, device and storage medium
CN115309915B (en) Knowledge graph construction method, device, equipment and storage medium
CN108763192B (en) Entity relation extraction method and device for text processing
CN112528654A (en) Natural language processing method and device and electronic equipment
Du et al. A convolutional attentional neural network for sentiment classification
Han et al. CNN-BiLSTM-CRF model for term extraction in Chinese corpus
CN111241273A (en) Text data classification method and device, electronic equipment and computer readable medium
CN111159405B (en) Irony detection method based on background knowledge
Rajani Shree et al. POS tagger model for Kannada text with CRF++ and deep learning approaches
CN112818124A (en) Entity relationship extraction method based on attention neural network
CN110198291B (en) Webpage backdoor detection method, device, terminal and storage medium
Li et al. A recurrent neural network language model based on word embedding
CN114936274A (en) Model training method, dialogue generating device, dialogue training equipment and storage medium
CN113886530A (en) Semantic phrase extraction method and related device
Garrido et al. Information extraction on weather forecasts with semantic technologies
CN113704472A (en) Hate and offensive statement identification method and system based on topic memory network
Praveena et al. Chunking based malayalam paraphrase identification using unfolding recursive autoencoders
Li et al. Using big data from the web to train chinese traffic word representation model in vector space
Mercan et al. Abstractive text summarization for resumes with cutting edge NLP transformers and LSTM
Chopra et al. Sequence Labeling using Conditional Random Fields
Wang et al. Realization of Chinese word segmentation based on deep learning method
Prajapati et al. Empirical Analysis of Humor Detection Using Deep Learning and Machine Learning on Kaggle Corpus

Legal Events

Date Code Title Description
EEER Examination request

Effective date: 20190403