CN107203511B - Network text named entity identification method based on neural network probability disambiguation - Google Patents
Network text named entity identification method based on neural network probability disambiguation Download PDFInfo
- Publication number
- CN107203511B CN107203511B CN201710390409.2A CN201710390409A CN107203511B CN 107203511 B CN107203511 B CN 107203511B CN 201710390409 A CN201710390409 A CN 201710390409A CN 107203511 B CN107203511 B CN 107203511B
- Authority
- CN
- China
- Prior art keywords
- neural network
- word
- named entity
- network
- words
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 83
- 238000000034 method Methods 0.000 title claims abstract description 42
- 239000013598 vector Substances 0.000 claims abstract description 61
- 239000011159 matrix material Substances 0.000 claims abstract description 23
- 238000012549 training Methods 0.000 claims abstract description 23
- 230000006870 function Effects 0.000 claims abstract description 20
- 238000012545 processing Methods 0.000 claims abstract description 8
- 238000010606 normalization Methods 0.000 claims abstract description 4
- 230000004913 activation Effects 0.000 claims description 10
- 238000004458 analytical method Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000001788 irregular Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- 230000006978 adaptation Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000003066 decision tree Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000005672 electromagnetic field Effects 0.000 description 1
- 230000008451 emotion Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000013179 statistical model Methods 0.000 description 1
- 238000012706 support-vector machine Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Machine Translation (AREA)
- Character Discrimination (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a network text named entity recognition method based on neural network probability disambiguation, which comprises the steps of segmenting words of a non-tag corpus, extracting Word vectors by using Word2Vec, converting a sample corpus into Word feature matrixes and windowing, constructing a deep neural network for training, adding a softmax function into an output layer of the neural network for normalization processing, and obtaining a probability matrix of each Word corresponding to a named entity category; and (4) re-windowing the probability matrix, and disambiguating by using a conditional random field model to obtain the final named entity label. The invention provides a word vector increment learning method without changing a neural network structure according to the characteristics of network words and new words, and adopts a probability disambiguation method for solving the problems of non-standard grammatical structure and many wrongly written words in a network text. Therefore, the method of the invention can generate higher accuracy in the task of identifying the network text naming entity.
Description
Technical Field
The invention relates to processing and analysis of web texts, in particular to a method for identifying a web text named entity based on neural network probability disambiguation.
Background
The network enables the speed and the scale of information acquisition and transmission to reach unprecedented levels, realizes global information sharing and interaction, and becomes an indispensable infrastructure of the information society. Modern communication and propagation technology greatly improves the speed and the breadth of information propagation. However, the problems and "side effects" associated with this are: the rough information sometimes makes it difficult to get the information needed most quickly and accurately from information oceans such as the sea. How to analyze named entities such as people, places, organizations and the like concerned by internet users from massive network texts provides important support information for various upper-layer applications such as online marketing, group emotion analysis and the like. This makes named entity recognition oriented to web text an important core technology in network data processing and analysis.
Methods of dealing with named entity recognition research has been largely divided into two categories, rule-based methods (rule-based) and statistical-based methods (static-based). With the continuous perfection of the machine learning theory and the great improvement of the calculation performance, the method based on statistics is more favored by people.
At present, a statistical model method for named entity recognition application mainly includes: hidden markov models, decision trees, maximum entropy models, support vector machines, conditional random fields, and artificial neural networks. The named entity recognition of the artificial neural network can obtain better results than that of a conditional random field model, a maximum entropy model and other models, but the practical use still mainly adopts the conditional random field model and the maximum entropy model, for example, a named entity recognition method and a named entity recognition device for microblog texts are provided by using the conditional random field in patent number CN201310182978.X and combining a named entity library, and a named entity recognition method using the maximum entropy model for modeling by using word features is provided in patent number CN200710098635. X. The reason why the artificial neural network is difficult to be used is that the artificial neural network often needs to convert words into vectors in a word vector space in the field of named entity recognition, and therefore, corresponding vectors cannot be obtained for new words, and large-scale practical application cannot be obtained.
Based on the above current situation, the named entity identification for web text mainly has the following problems: first, because there are a lot of network words, new words, and wrongly written words, the network text cannot train a word vector space containing all words to train a neural network. Secondly, the named entity recognition accuracy rate of the network text is reduced due to the phenomena of any language form, irregular grammar structure, many wrongly written characters and the like.
Disclosure of Invention
The purpose of the invention is as follows: in order to overcome the defects in the prior art, the invention provides the network text named entity recognition method based on the neural network probability disambiguation, which can extract the word characteristics in an increment mode without retraining the neural network and simultaneously recognize the probability disambiguation.
The technical scheme is as follows: in order to achieve the purpose, the invention adopts the technical scheme that:
a network text named entity recognition method based on neural network probability disambiguation is characterized in that unlabeled corpora are participled, Word vectors are extracted through Word2Vec, sample corpora are converted into Word feature matrixes and windowed, a deep neural network is constructed for training, a softmax function is added into an output layer of the neural network for normalization processing, and a probability matrix of a named entity category corresponding to each Word is obtained. And (4) re-windowing the probability matrix, and disambiguating by using a conditional random field model to obtain the final named entity label.
The method specifically comprises the following steps:
And 2, training a Word vector space for the unlabeled corpus and the sample corpus which are well participled through a Word2Vec tool.
And 3, converting the text in the sample corpus into Word vectors representing Word characteristics according to the trained Word2Vec model, windowing the Word vectors, and taking a two-dimensional matrix obtained by multiplying the window w by the length d of the Word vectors as the input of the neural network. And converting the labels in the sample corpus into a one-hot form to be used as the output of the neural network. The output layer of the neural network is normalized by adopting a softmax function, so that the classification result of the neural network is the probability that the vocabulary belongs to the non-named entities and various named entities, the structure, the depth, the number of nodes, the step length, the activation function and the initial value parameter in the neural network are adjusted, and the activation function is selected to train the neural network.
And 4, re-windowing the prediction matrix output by the neural network, taking context prediction information of the words to be labeled as correlation points of actual classification of the words to be labeled in the conditional random field model, calculating expected values of all sides by using an EM (effective noise) algorithm according to the training corpus, and training the corresponding conditional random field model.
And 5, during recognition, firstly converting the text to be recognized into Word vectors representing Word characteristics according to the trained Word2Vec model, if the Word2Vec model does not contain corresponding training words, converting the words into the Word vectors by adopting a method of incremental learning, obtaining the Word vectors and backtracking Word vector space, windowing the Word vectors, and taking a two-dimensional matrix formed by multiplying a window w by the length d of the Word vectors as the input of the neural network. And then, the prediction matrix obtained by the neural network is placed into the trained conditional random field model again in a windowed manner for disambiguation, and the final named entity label in the text to be recognized is obtained.
Preferably: the parameters of the Word2Vec tool are as follows: selecting the word vector length 200, performing 25 times of iteration, performing initial step size 0.025 and minimum step size 0.0001, and selecting a CBOW model.
Preferably: the parameters of the neural network are as follows: hiding the layer 2, the number of hidden nodes is 150, the step size is 0.01, the batch size is selected 40, and the sigmoid function is used as the activation function.
Preferably, the method for converting the tags in the sample corpus into the one-hot form is that the tags of the "/o", "/n", "/p" in the sample corpus are correspondingly converted into the named entity tags "/Org-B", "/Org-I", "/Per-B", "/Per-I", "/L oc-B", "/L oc-I", and then converted into the one-hot form.
Preferably: the window size for word vector windowing is 5.
Preferably: when the neural network is trained, one tenth of words are extracted from the sample data and do not participate in the training of the neural network, and the words are used as the measuring standard of the neural network.
Compared with the prior art, the invention has the following beneficial effects:
the word vectors without retraining the neural network can be extracted in an incremental mode, and the neural network is used for predicting and disambiguating by using the probability model, so that the method has better practicability, accuracy and accuracy in the named entity recognition of the web text. In the named entity recognition task of the web text, the invention provides a word vector increment learning method without changing a neural network structure according to the characteristics of the existing web words and the new words, and adopts a probability disambiguation method for solving the problems of non-standard grammatical structure and more wrongly written words in the web text. Therefore, the method of the invention can generate higher accuracy in the task of identifying the network text naming entity.
Drawings
FIG. 1 is a flow diagram for training a device for network-text-named-entity recognition based on neural-network probability disambiguation in accordance with the present invention.
Fig. 2 is a flow chart for converting words into word features according to the present invention.
FIG. 3 is a schematic diagram of text processing and neural network architecture in accordance with the present invention.
Detailed Description
The present invention is further illustrated by the following description in conjunction with the accompanying drawings and the specific embodiments, it is to be understood that these examples are given solely for the purpose of illustration and are not intended as a definition of the limits of the invention, since various equivalent modifications will occur to those skilled in the art upon reading the present invention and fall within the limits of the appended claims.
A network text named entity recognition method based on neural network probability disambiguation is characterized in that unlabeled corpora are participled, Word vectors are extracted through Word2Vec, sample corpora are converted into Word feature matrixes and windowed, a deep neural network is constructed for training, a softmax function is added into an output layer of the neural network for normalization processing, and a probability matrix of a named entity category corresponding to each Word is obtained. And (4) re-windowing the probability matrix, and disambiguating by using a conditional random field model to obtain the final named entity label.
The method specifically comprises the following steps:
And 2, training a Word vector space for the unlabeled corpus and the sample corpus which are well participled through a Word2Vec tool.
And 3, converting the text in the sample corpus into a Word vector representing Word characteristics according to the trained Word2Vec model, and taking the Word vector as the input of the neural network. And (3) converting the labels in the sample corpus into a one-hot form as the output of the neural network, wherein in a text processing task, one named entity can be divided into a plurality of vocabularies, so that in order to ensure the completeness of the identified named entity, the labeling form is labeled by adopting an IOB mode.
The method is characterized in that the vocabulary is named after the word is input, and the word is input into the neural network, namely the word and the characteristic information of the fixed-length context of the word are used as the input of the neural network when the vocabulary is judged, and the input of the neural network is not the length d of a word characteristic vector but a two-dimensional matrix of the window w multiplied by the word characteristic length d.
The output layer of the neural network is normalized by adopting a softmax function, so that the classification result of the neural network is the probability that the vocabulary belongs to the non-named entities and various named entities. Adjusting the structure, depth, node number, step length, activation function, initial value parameter and selecting activation function in the neural network to train the neural network.
And 4, re-windowing the prediction matrix output by the neural network, taking context prediction information of the words to be labeled as correlation points of actual classification of the words to be labeled in the conditional random field model, calculating expected values of all sides by using an EM (effective noise) algorithm according to the training corpus, and training the corresponding conditional random field model.
And 5, during recognition, firstly converting the text to be recognized into Word vectors representing Word characteristics according to the trained Word2Vec model, and if the Word2Vec model does not contain corresponding training words, converting the words into the Word vectors by adopting a method of incremental learning, obtaining the Word vectors and backtracking a Word vector space.
(1) And matching the vocabulary to be converted in the trained word vector space.
(2) If the vocabulary to be converted can be matched in the word vector space, the vocabulary is directly converted into the corresponding word vector.
(3) If the Word2Vec model does not contain the corresponding vocabulary, the Word vector space is backed up, the reduction of the precision of the neural network model caused by the Word space offset generated by incremental learning is prevented, the Word2Vec model is loaded, the sentence where the unmatched vocabulary is located is obtained and is put into the Word2Vec model for incremental training, the Word vector of the vocabulary is obtained, and the model is backtracked by utilizing the backed Word vector space.
Windowing the word vector, and taking a two-dimensional matrix of the window w multiplied by the length d of the word vector as the input of the neural network. And then, the prediction matrix obtained by the neural network is placed into the trained conditional random field model again in a windowed manner for disambiguation, and the final named entity label in the text to be recognized is obtained.
Examples of the invention
The method comprises the steps of downloading named entity corpora from a website crawler web text for dog searching news as sample corpora from a data room corpus, utilizing a natural language tool to perform Word segmentation on the crawler web text, utilizing a generic packet in python to perform Word vector space training through a Word2Vec model on the Word corpus and the sample corpora with the following specific parameters, selecting a Word vector length of 200, iterating for 25 times, an initial step length of 0.025, a minimum step length of 0.0001, and selecting a CBOW model.
The method comprises the steps of converting texts of sample corpora into Word vectors representing Word characteristics according to a trained Word2Vec model, and converting the words into Word vectors by adopting a method of incremental learning, Word vector acquisition and Word vector backtracking if the Word2Vec model does not contain corresponding training words, wherein the Word vectors are used as the characteristics of each Word.
Setting the window size to be 5, namely when the category of the named entity of the current word is considered, taking the word characteristics of the current word and the word characteristics of the front word and the word characteristics of the rear word as the input of a neural network, wherein the input of the neural network is the vector of batchSize 1000, extracting one tenth of words from sample data without participating in the training of the neural network as the measurement standard of the neural network, normalizing the output layer of the neural network by adopting a softmax function, enabling the classification result of the neural network to be the probability that the words belong to non-named entities and various named entities, and temporarily taking the maximum value of the probability as the final classification result. The method comprises the steps of adjusting parameters such as the structure, the depth, the number of nodes, the step size, the activation function and the initial value in the neural network to enable the neural network to obtain good accuracy, wherein the final specific parameters are as follows, the number of hidden nodes is 150 in the hidden layer 2, the step size is 0.01, the batch size is selected to be 40, the activation function can generate a good classification effect when using sigmoid, the accuracy can reach 99.83%, and the F values of most representative names of people, places and mechanisms can reach 93.4%, 84.2% and 80.4%.
Removing the step of taking the maximum probability value of the prediction matrix output by the neural network as the final classification result, directly re-windowing the probability matrix, taking context prediction information of the word to be labeled as the correlation point of the actual classification of the word to be labeled in the conditional random field model, calculating the expected value of each side of the conditional random field by using an EM (effective electromagnetic field) algorithm according to the training corpus, training a corresponding conditional random field model, and enabling the F values of the name of the person, the place and the organization to be increased to 94.8%, 85.0% and 82.0% after disambiguation by using the conditional random field.
It can be seen from the above specific embodiments that, compared with the conventional supervised named entity recognition method, the text named entity recognition method based on neural network probability disambiguation provided by the invention uses a word vector conversion method capable of incrementally extracting word features without generating word vector space offset, so that the neural network can be applied to network texts with many new words and wrongly-written or mispronounced words. Moreover, the probability matrix output by the neural network is windowed again, and the conditional random field model is adopted for context disambiguation, so that the phenomena of more wrongly written characters and irregular grammar in the web text can be better solved.
The above description is only of the preferred embodiments of the present invention, and it should be noted that: it will be apparent to those skilled in the art that various modifications and adaptations can be made without departing from the principles of the invention and these are intended to be within the scope of the invention.
Claims (6)
1. A network text named entity identification method based on neural network probability disambiguation is characterized in that: segmenting words of the unlabeled corpus, extracting Word vectors by using Word2Vec, converting the sample corpus into Word feature matrixes, windowing, constructing a deep neural network for training, adding a softmax function into an output layer of the neural network for normalization processing, and obtaining a probability matrix of the named entity category corresponding to each Word; the probability matrix is re-windowed, and disambiguation is performed by using a conditional random field model to obtain a final named entity label, which comprises the following steps:
step 1, obtaining a non-tag corpus through a webpage crawler, obtaining a sample corpus labeled with a named entity from a corpus, and segmenting the non-tag corpus by using a natural language tool;
step 2, training Word vector space of the unlabeled corpus and the sample corpus which are well participled through a Word2Vec tool;
step 3, converting the text in the sample corpus into Word vectors representing Word characteristics according to the trained Word2Vec model, windowing the Word vectors, and taking a two-dimensional matrix obtained by multiplying a window w by the length d of the Word vectors as the input of a neural network; converting the labels in the sample corpus into a one-hot form to be used as the output of the neural network; the output layer of the neural network is normalized by adopting a softmax function, so that the classification result of the neural network is the probability that the vocabulary belongs to the non-named entities and various named entities, the structure, the depth, the number of nodes, the step length, the activation function and the initial value parameter in the neural network are adjusted, and the activation function is selected to train the neural network;
step 4, the prediction matrix output by the neural network is windowed again, context prediction information of the words to be labeled is used as correlation points of actual classification of the words to be labeled in the conditional random field model, expected values of all sides are calculated by using an EM algorithm according to training corpora, and a corresponding conditional random field model is trained;
step 5, during recognition, firstly converting a text to be recognized into Word vectors representing Word characteristics according to a trained Word2Vec model, if the Word2Vec model does not contain corresponding words, converting the words into the Word vectors by adopting a method of incremental learning, obtaining the Word vectors and backtracking Word vector space, windowing the Word vectors, and taking a two-dimensional matrix of a window w multiplied by the length d of the Word vectors as the input of a neural network; and then, the prediction matrix obtained by the neural network is placed into the trained conditional random field model again in a windowed manner for disambiguation, and the final named entity label in the text to be recognized is obtained.
2. The network text named entity recognition method based on neural network probability disambiguation as claimed in claim 1, characterized in that: the parameters of the Word2Vec tool are as follows: selecting the word vector length 200, performing 25 times of iteration, performing initial step size 0.025 and minimum step size 0.0001, and selecting a CBOW model.
3. The network text named entity recognition method based on neural network probability disambiguation as claimed in claim 1, characterized in that: the parameters of the neural network are as follows: hiding the layer 2, the number of hidden nodes is 150, the step size is 0.01, the batch size is selected 40, and the sigmoid function is used as the activation function.
4. The network text named entity recognition method based on neural network probability disambiguation as claimed in claim 1, wherein the method for converting the tags in the sample corpus into one-hot format comprises converting the "/o", "/n", "/p" tags in the sample corpus into named entity tags "/Org-B", "/Org-I", "/Per-B", "/Per-I", "/L oc-B", "/L oc-I", and converting them into one-hot format.
5. The network text named entity recognition method based on neural network probability disambiguation as claimed in claim 1, characterized in that: the window size for word vector windowing is 5.
6. The network text named entity recognition method based on neural network probability disambiguation as claimed in claim 1, characterized in that: when the neural network is trained, one tenth of words are extracted from the sample data and do not participate in the training of the neural network, and the words are used as the measuring standard of the neural network.
Priority Applications (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710390409.2A CN107203511B (en) | 2017-05-27 | 2017-05-27 | Network text named entity identification method based on neural network probability disambiguation |
RU2019117529A RU2722571C1 (en) | 2017-05-27 | 2017-06-20 | Method of recognizing named entities in network text based on elimination of probability ambiguity in neural network |
AU2017416649A AU2017416649A1 (en) | 2017-05-27 | 2017-06-20 | Method for recognizing network text named entity based on neural network probability disambiguation |
PCT/CN2017/089135 WO2018218705A1 (en) | 2017-05-27 | 2017-06-20 | Method for recognizing network text named entity based on neural network probability disambiguation |
CA3039280A CA3039280C (en) | 2017-05-27 | 2017-06-20 | Method for recognizing network text named entity based on neural network probability disambiguation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710390409.2A CN107203511B (en) | 2017-05-27 | 2017-05-27 | Network text named entity identification method based on neural network probability disambiguation |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107203511A CN107203511A (en) | 2017-09-26 |
CN107203511B true CN107203511B (en) | 2020-07-17 |
Family
ID=59905476
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710390409.2A Active CN107203511B (en) | 2017-05-27 | 2017-05-27 | Network text named entity identification method based on neural network probability disambiguation |
Country Status (5)
Country | Link |
---|---|
CN (1) | CN107203511B (en) |
AU (1) | AU2017416649A1 (en) |
CA (1) | CA3039280C (en) |
RU (1) | RU2722571C1 (en) |
WO (1) | WO2018218705A1 (en) |
Families Citing this family (63)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107203511B (en) * | 2017-05-27 | 2020-07-17 | 中国矿业大学 | Network text named entity identification method based on neural network probability disambiguation |
CN107665252B (en) * | 2017-09-27 | 2020-08-25 | 深圳证券信息有限公司 | Method and device for creating knowledge graph |
CN107832289A (en) * | 2017-10-12 | 2018-03-23 | 北京知道未来信息技术有限公司 | A kind of name entity recognition method based on LSTM CNN |
CN107908614A (en) * | 2017-10-12 | 2018-04-13 | 北京知道未来信息技术有限公司 | A kind of name entity recognition method based on Bi LSTM |
CN107885721A (en) * | 2017-10-12 | 2018-04-06 | 北京知道未来信息技术有限公司 | A kind of name entity recognition method based on LSTM |
CN107967251A (en) * | 2017-10-12 | 2018-04-27 | 北京知道未来信息技术有限公司 | A kind of name entity recognition method based on Bi-LSTM-CNN |
CN107797989A (en) * | 2017-10-16 | 2018-03-13 | 平安科技(深圳)有限公司 | Enterprise name recognition methods, electronic equipment and computer-readable recording medium |
CN107943788B (en) * | 2017-11-17 | 2021-04-06 | 平安科技(深圳)有限公司 | Enterprise abbreviation generation method and device and storage medium |
CN110019648B (en) * | 2017-12-05 | 2021-02-02 | 深圳市腾讯计算机系统有限公司 | Method and device for training data and storage medium |
CN108121702B (en) * | 2017-12-26 | 2020-11-24 | 浙江讯飞智能科技有限公司 | Method and system for evaluating and reading mathematical subjective questions |
CN108052504B (en) * | 2017-12-26 | 2020-11-20 | 浙江讯飞智能科技有限公司 | Structure analysis method and system for mathematic subjective question answer result |
CN108280062A (en) * | 2018-01-19 | 2018-07-13 | 北京邮电大学 | Entity based on deep learning and entity-relationship recognition method and device |
CN108563626B (en) * | 2018-01-22 | 2022-01-25 | 北京颐圣智能科技有限公司 | Medical text named entity recognition method and device |
CN108388559B (en) * | 2018-02-26 | 2021-11-19 | 中译语通科技股份有限公司 | Named entity identification method and system under geographic space application and computer program |
CN108763192B (en) * | 2018-04-18 | 2022-04-19 | 达而观信息科技(上海)有限公司 | Entity relation extraction method and device for text processing |
CN108805196B (en) * | 2018-06-05 | 2022-02-18 | 西安交通大学 | Automatic incremental learning method for image recognition |
RU2699687C1 (en) * | 2018-06-18 | 2019-09-09 | Общество с ограниченной ответственностью "Аби Продакшн" | Detecting text fields using neural networks |
CN109062983A (en) * | 2018-07-02 | 2018-12-21 | 北京妙医佳信息技术有限公司 | Name entity recognition method and system for medical health knowledge mapping |
CN109241520B (en) * | 2018-07-18 | 2023-05-23 | 五邑大学 | Sentence trunk analysis method and system based on multi-layer error feedback neural network for word segmentation and named entity recognition |
CN109255119B (en) * | 2018-07-18 | 2023-04-25 | 五邑大学 | Sentence trunk analysis method and system of multi-task deep neural network based on word segmentation and named entity recognition |
CN109299458B (en) * | 2018-09-12 | 2023-03-28 | 广州多益网络股份有限公司 | Entity identification method, device, equipment and storage medium |
CN109446514A (en) * | 2018-09-18 | 2019-03-08 | 平安科技(深圳)有限公司 | Construction method, device and the computer equipment of news property identification model |
CN109657238B (en) * | 2018-12-10 | 2023-10-13 | 宁波深擎信息科技有限公司 | Knowledge graph-based context identification completion method, system, terminal and medium |
CN109710927B (en) * | 2018-12-12 | 2022-12-20 | 东软集团股份有限公司 | Named entity identification method and device, readable storage medium and electronic equipment |
CN109670177A (en) * | 2018-12-20 | 2019-04-23 | 翼健(上海)信息科技有限公司 | One kind realizing the semantic normalized control method of medicine and control device based on LSTM |
CN109858025B (en) * | 2019-01-07 | 2023-06-13 | 鼎富智能科技有限公司 | Word segmentation method and system for address standardized corpus |
CN109767817B (en) * | 2019-01-16 | 2023-05-30 | 南通大学 | Drug potential adverse reaction discovery method based on neural network language model |
CN111563380A (en) * | 2019-01-25 | 2020-08-21 | 浙江大学 | Named entity identification method and device |
CN109800437B (en) * | 2019-01-31 | 2023-11-14 | 北京工业大学 | Named entity recognition method based on feature fusion |
CN109992629B (en) * | 2019-02-28 | 2021-08-06 | 中国科学院计算技术研究所 | Neural network relation extraction method and system fusing entity type constraints |
CN109858041B (en) * | 2019-03-07 | 2023-02-17 | 北京百分点科技集团股份有限公司 | Named entity recognition method combining semi-supervised learning with user-defined dictionary |
CN109933801B (en) * | 2019-03-25 | 2022-03-29 | 北京理工大学 | Bidirectional LSTM named entity identification method based on predicted position attention |
CN111858838A (en) * | 2019-04-04 | 2020-10-30 | 拉扎斯网络科技(上海)有限公司 | Menu calibration method and device, electronic equipment and nonvolatile storage medium |
CN110083778A (en) * | 2019-04-08 | 2019-08-02 | 清华大学 | The figure convolutional neural networks construction method and device of study separation characterization |
CN110245242B (en) * | 2019-06-20 | 2022-01-18 | 北京百度网讯科技有限公司 | Medical knowledge graph construction method and device and terminal |
CN110298043B (en) * | 2019-07-03 | 2023-04-07 | 吉林大学 | Vehicle named entity identification method and system |
CN110750992B (en) * | 2019-10-09 | 2023-07-04 | 吉林大学 | Named entity recognition method, named entity recognition device, electronic equipment and named entity recognition medium |
CN110781646B (en) * | 2019-10-15 | 2023-08-22 | 泰康保险集团股份有限公司 | Name standardization method, device, medium and electronic equipment |
CN111008271B (en) * | 2019-11-20 | 2022-06-24 | 佰聆数据股份有限公司 | Neural network-based key information extraction method and system |
CN110993081B (en) * | 2019-12-03 | 2023-08-11 | 济南大学 | Doctor online recommendation method and system |
CN111091003B (en) * | 2019-12-05 | 2023-10-10 | 电子科技大学广东电子信息工程研究院 | Parallel extraction method based on knowledge graph query |
CN111209748B (en) * | 2019-12-16 | 2023-10-24 | 合肥讯飞数码科技有限公司 | Error word recognition method, related device and readable storage medium |
CN113139382A (en) * | 2020-01-20 | 2021-07-20 | 北京国双科技有限公司 | Named entity identification method and device |
CN111368545B (en) * | 2020-02-28 | 2024-04-30 | 北京明略软件系统有限公司 | Named entity recognition method and device based on multitask learning |
CN111477320B (en) * | 2020-03-11 | 2023-05-30 | 北京大学第三医院(北京大学第三临床医学院) | Treatment effect prediction model construction system, treatment effect prediction system and terminal |
CN111523323B (en) * | 2020-04-26 | 2022-08-12 | 梁华智能科技(上海)有限公司 | Disambiguation processing method and system for Chinese word segmentation |
CN111581957B (en) * | 2020-05-06 | 2022-04-12 | 浙江大学 | Nested entity detection method based on pyramid hierarchical network |
CN111476022B (en) * | 2020-05-15 | 2023-07-07 | 湖南工商大学 | Character embedding and mixed LSTM entity identification method, system and medium for entity characteristics |
CN111859937A (en) * | 2020-07-20 | 2020-10-30 | 上海汽车集团股份有限公司 | Entity identification method and device |
RU2760637C1 (en) * | 2020-08-31 | 2021-11-29 | Публичное Акционерное Общество "Сбербанк России" (Пао Сбербанк) | Method and system for retrieving named entities |
CN112101041B (en) * | 2020-09-08 | 2022-02-15 | 平安科技(深圳)有限公司 | Entity relationship extraction method, device, equipment and medium based on semantic similarity |
CN112765983A (en) * | 2020-12-14 | 2021-05-07 | 四川长虹电器股份有限公司 | Entity disambiguation method based on neural network combined with knowledge description |
CN112487816B (en) * | 2020-12-14 | 2024-02-13 | 安徽大学 | Named entity identification method based on network classification |
CN112905742B (en) * | 2021-02-20 | 2022-07-29 | 厦门吉比特网络技术股份有限公司 | Method and device for recognizing new vocabulary based on semantic model neural network |
CN113343690B (en) * | 2021-06-22 | 2024-03-12 | 北京语言大学 | Text readability automatic evaluation method and device |
CN114218924A (en) * | 2021-07-27 | 2022-03-22 | 广东电力信息科技有限公司 | Text intention and entity combined identification method based on BERT model |
CN113849597B (en) * | 2021-08-31 | 2024-04-30 | 艾迪恩(山东)科技有限公司 | Illegal advertisement word detection method based on named entity recognition |
CN114036948B (en) * | 2021-10-26 | 2024-05-31 | 天津大学 | Named entity identification method based on uncertainty quantification |
CN114048749B (en) * | 2021-11-19 | 2024-02-02 | 北京第一因科技有限公司 | Chinese named entity recognition method suitable for multiple fields |
CN114510943B (en) * | 2022-02-18 | 2024-05-28 | 北京大学 | Incremental named entity recognition method based on pseudo sample replay |
WO2023204724A1 (en) * | 2022-04-20 | 2023-10-26 | Общество С Ограниченной Ответственностью "Дентонс Юроп" (Ооо "Дентонс Юроп") | Method for analyzing a legal document |
CN115587594B (en) * | 2022-09-20 | 2023-06-30 | 广东财经大学 | Unstructured text data extraction model training method and system for network security |
CN115905456B (en) * | 2023-01-06 | 2023-06-02 | 浪潮电子信息产业股份有限公司 | Data identification method, system, equipment and computer readable storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103455581A (en) * | 2013-08-26 | 2013-12-18 | 北京理工大学 | Mass short message information filtering method based on semantic extension |
CN105740349A (en) * | 2016-01-25 | 2016-07-06 | 重庆邮电大学 | Sentiment classification method capable of combining Doc2vce with convolutional neural network |
CN105868184A (en) * | 2016-05-10 | 2016-08-17 | 大连理工大学 | Chinese name recognition method based on recurrent neural network |
CN106202032A (en) * | 2016-06-24 | 2016-12-07 | 广州数说故事信息科技有限公司 | A kind of sentiment analysis method towards microblogging short text and system thereof |
Family Cites Families (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7502971B2 (en) * | 2005-10-12 | 2009-03-10 | Hewlett-Packard Development Company, L.P. | Determining a recurrent problem of a computer resource using signatures |
US8583416B2 (en) * | 2007-12-27 | 2013-11-12 | Fluential, Llc | Robust information extraction from utterances |
RU2399959C2 (en) * | 2008-10-29 | 2010-09-20 | Закрытое акционерное общество "Авикомп Сервисез" | Method for automatic text processing in natural language through semantic indexation, method for automatic processing collection of texts in natural language through semantic indexation and computer readable media |
US8239349B2 (en) * | 2010-10-07 | 2012-08-07 | Hewlett-Packard Development Company, L.P. | Extracting data |
CN105404632B (en) * | 2014-09-15 | 2020-07-31 | 深港产学研基地 | System and method for carrying out serialized annotation on biomedical text based on deep neural network |
CN104809176B (en) * | 2015-04-13 | 2018-08-07 | 中央民族大学 | Tibetan language entity relation extraction method |
CN106202044A (en) * | 2016-07-07 | 2016-12-07 | 武汉理工大学 | A kind of entity relation extraction method based on deep neural network |
CN107203511B (en) * | 2017-05-27 | 2020-07-17 | 中国矿业大学 | Network text named entity identification method based on neural network probability disambiguation |
-
2017
- 2017-05-27 CN CN201710390409.2A patent/CN107203511B/en active Active
- 2017-06-20 CA CA3039280A patent/CA3039280C/en active Active
- 2017-06-20 AU AU2017416649A patent/AU2017416649A1/en not_active Abandoned
- 2017-06-20 RU RU2019117529A patent/RU2722571C1/en active
- 2017-06-20 WO PCT/CN2017/089135 patent/WO2018218705A1/en active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103455581A (en) * | 2013-08-26 | 2013-12-18 | 北京理工大学 | Mass short message information filtering method based on semantic extension |
CN105740349A (en) * | 2016-01-25 | 2016-07-06 | 重庆邮电大学 | Sentiment classification method capable of combining Doc2vce with convolutional neural network |
CN105868184A (en) * | 2016-05-10 | 2016-08-17 | 大连理工大学 | Chinese name recognition method based on recurrent neural network |
CN106202032A (en) * | 2016-06-24 | 2016-12-07 | 广州数说故事信息科技有限公司 | A kind of sentiment analysis method towards microblogging short text and system thereof |
Also Published As
Publication number | Publication date |
---|---|
WO2018218705A1 (en) | 2018-12-06 |
CA3039280C (en) | 2021-07-20 |
CN107203511A (en) | 2017-09-26 |
CA3039280A1 (en) | 2018-12-06 |
AU2017416649A1 (en) | 2019-05-02 |
RU2722571C1 (en) | 2020-06-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107203511B (en) | Network text named entity identification method based on neural network probability disambiguation | |
CN109597997B (en) | Comment entity and aspect-level emotion classification method and device and model training thereof | |
CN108920460B (en) | Training method of multi-task deep learning model for multi-type entity recognition | |
CN110059188B (en) | Chinese emotion analysis method based on bidirectional time convolution network | |
CN106980683B (en) | Blog text abstract generating method based on deep learning | |
CN107085581B (en) | Short text classification method and device | |
CN113239186B (en) | Graph convolution network relation extraction method based on multi-dependency relation representation mechanism | |
CN109753660B (en) | LSTM-based winning bid web page named entity extraction method | |
CN110765775A (en) | Self-adaptive method for named entity recognition field fusing semantics and label differences | |
CN113255320A (en) | Entity relation extraction method and device based on syntax tree and graph attention machine mechanism | |
CN110046356B (en) | Label-embedded microblog text emotion multi-label classification method | |
CN106682089A (en) | RNNs-based method for automatic safety checking of short message | |
CN112069312B (en) | Text classification method based on entity recognition and electronic device | |
CN110968725B (en) | Image content description information generation method, electronic device and storage medium | |
CN112561718A (en) | Case microblog evaluation object emotion tendency analysis method based on BilSTM weight sharing | |
CN112434514B (en) | Multi-granularity multi-channel neural network based semantic matching method and device and computer equipment | |
CN111159405B (en) | Irony detection method based on background knowledge | |
Wang et al. | Mongolian named entity recognition with bidirectional recurrent neural networks | |
CN115309915A (en) | Knowledge graph construction method, device, equipment and storage medium | |
CN113204975A (en) | Sensitive character wind identification method based on remote supervision | |
Shelke et al. | A novel approach for named entity recognition on Hindi language using residual bilstm network | |
CN115186670B (en) | Method and system for identifying domain named entities based on active learning | |
CN116644148A (en) | Keyword recognition method and device, electronic equipment and storage medium | |
CN115796141A (en) | Text data enhancement method and device, electronic equipment and storage medium | |
CN115577111A (en) | Text classification method based on self-attention mechanism |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |