CN108170678A - A kind of text entities abstracting method and system - Google Patents

A kind of text entities abstracting method and system Download PDF

Info

Publication number
CN108170678A
CN108170678A CN201711450896.3A CN201711450896A CN108170678A CN 108170678 A CN108170678 A CN 108170678A CN 201711450896 A CN201711450896 A CN 201711450896A CN 108170678 A CN108170678 A CN 108170678A
Authority
CN
China
Prior art keywords
word
preset
entity
entities
urtext
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711450896.3A
Other languages
Chinese (zh)
Inventor
晋彤
张中弦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Yun Run Great Data Services Co Ltd
Original Assignee
Guangzhou Yun Run Great Data Services Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Yun Run Great Data Services Co Ltd filed Critical Guangzhou Yun Run Great Data Services Co Ltd
Priority to CN201711450896.3A priority Critical patent/CN108170678A/en
Publication of CN108170678A publication Critical patent/CN108170678A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Abstract

The invention discloses a kind of text entities abstracting method and system, the text entities abstracting method includes acquisition urtext;According to preset entity dictionary, the entity word not being indexed in the entity dictionary is searched for from the urtext and forms testing material collection;According to the testing material collection, the preset double-deck neural network extraction model of training;According to the preset double-deck neural network extraction model and the testing material collection, predict novel entities word and update the novel entities word into the preset entity dictionary.The word or network neologisms that dictionary do not include can be identified by the text entities abstracting method, improve the accuracy and efficiency that text entities extract.

Description

A kind of text entities abstracting method and system
Technical field
The present invention relates to natural language processing technique fields, and in particular to a kind of text entities abstracting method and system.
Background technology
With the continuous development of science and technology especially information technology, interpersonal exchange way is from simple Face-to-face exchange is developed to more and more using " text " this linguistic form as information carrier.Example the most apparent is just It is digital library and web page text.Unquestionably, can be that user's acquisition information is carried to effective management of these language resources For very big facility.But with the development of network communication, the quantity of online available text information drastically expands, it might even be possible to say It is that exponentially grade increases, if it is not only time-consuming and laborious to be classified by hand to these texts as before again, and accuracy rate Also can not ensure.Therefore the text entities abstracting method based on natural language processing technique comes into being.
At present, the method that text entities extract can realize the text classification of magnanimity big data, at the same still information extraction, Question answering system, syntactic analysis, machine translation, the important foundation towards application fields such as the metadata marks of Semantic Web. Existing text entities abstracting method relies primarily on dictionary, word in matched text is identified based on dictionary, so as to obtain Entity with dictionary matching, the entity in usual text mainly include name, place name, mechanism name, proper noun etc..But Since existing text entities abstracting method excessively relies on dictionary, the word or network neologisms do not included for dictionary, it is impossible to identify Out, the accuracy and efficiency that text entities extract is reduced.
Invention content
The object of the present invention is to provide a kind of text entities abstracting method and system, can identify word that dictionary do not include or Network neologisms improve the accuracy and efficiency that text entities extract.
For solution more than technical problem, the embodiment of the present invention provides a kind of text entities abstracting method, including:
Acquire urtext;
According to preset entity dictionary, the entity word not being indexed in the entity dictionary is searched for from the urtext Form testing material collection;
According to the testing material collection, the preset double-deck neural network extraction model of training;
According to the preset double-deck neural network extraction model and the testing material collection, novel entities word is predicted and by institute It states in the update to the preset entity dictionary of novel entities word.
Preferably, the text entities abstracting method further includes:
According to SVM complex nucleus fonction composition convolution kernel function and substance feature kernel function, entity word disaggregated model is established;
According to the entity word disaggregated model, classification annotation is carried out to the novel entities word;
According to preset loss function, the novel entities word is verified.
Preferably, it is described according to the testing material collection, the preset double-deck neural network extraction model of training, specific packet It includes:
According to Skip-gram algorithms and Bag-of-words algorithms, establish the preset double-deck neural network and extract mould Type;
It is calculated according to the testing material collection, the property parameters of the Skip-gram algorithms and the Bag-of-words The property parameters of method generate joint term vector;
According to the joint term vector, the preset double-deck neural network extraction model of training.
Preferably, the text entities abstracting method further includes:
Noise reduction process is carried out to the urtext;
According to preset participle model, word segmentation processing is carried out to the urtext after noise reduction.
Preferably, it is described according to preset participle model, word segmentation processing is carried out to the urtext after noise reduction, specifically Including:
According to MMseg partitioning algorithms and CRF distinguished numbers, the preset participle model is established;
The ambiguity word in the urtext after noise reduction is sentenced according to the CRF distinguished numbers of the preset participle model It does not analyze;
Cutting processing is carried out to the urtext after noise reduction according to the MMseg partitioning algorithms of the preset participle model.
Preferably, it is described according to preset entity dictionary, it is searched for from the urtext and is not indexed to the entity word Entity word in library forms testing material collection, specifically includes:
According to the preset entity dictionary, the primary entities that the entity dictionary is indexed in the urtext are identified Word;
According to the primary entities word, syntactic analysis, context analysis and probability are carried out to the urtext Statistics obtains the entity word not being indexed in the entity dictionary, and forms the testing material collection.
Preferably, the preset double-deck neural network extraction model is:
Wherein, XnFor the joint term vector, ynFor the novel entities word of the prediction, N is the big of the testing material collection It is small;C is the parameter of softmax functions, and A is the term vector matrix of pre-training.
Preferably, the entity word disaggregated model is:
Wherein, λ is weight coefficient, 0 < < 1;E1,E2For two novel entities words;SFT includes tree for shortest path;CTK is The convolution tree kernel function;Equal is the substance feature kernel function;E1·CiFor entity word E1I-th of class another characteristic, E2·CiFor entity word E2I-th of class another characteristic, work as E1When belonging to the i-th classification, E1.CiIt is 1, is otherwise 0;Work as E1.Ci, E2.CiWhen being 1 simultaneously, the value of Equal is 1, is otherwise 0;M is class number.
The embodiment of the present invention further includes a kind of text entities extraction system, including:
Text collection module, for acquiring urtext;
Testing material collection generation module, for according to preset entity dictionary, searching for from the urtext and not including Testing material collection is formed to the entity word in the entity dictionary;
Model training module, for according to the testing material collection, the preset double-deck neural network extraction model of training;
Entity word prediction module, for according to the preset double-deck neural network extraction model and the testing material Collection predicts novel entities word and updates the novel entities word into the preset entity dictionary.
Preferably, the text entities extraction system further includes:
Disaggregated model establishes module, for according to SVM complex nucleus fonction composition convolution kernel function and substance feature kernel function, Establish entity word disaggregated model;
Classification annotation module, for according to the entity word disaggregated model, classification annotation to be carried out to the novel entities word;
Entity word authentication module, for according to preset loss function, being verified to the novel entities word.
Opposite and the prior art, a kind of advantageous effect of text entities abstracting method provided in an embodiment of the present invention are: The text entities abstracting method includes acquisition urtext;According to preset entity dictionary, searched for from the urtext The entity word not being indexed in the entity dictionary forms testing material collection;According to the testing material collection, training is preset double Layer neural network extraction model;According to the preset double-deck neural network extraction model and the testing material collection, prediction is new Entity word simultaneously will be in novel entities word update to the preset entity dictionary.It can by the text entities abstracting method The word or network neologisms that identification dictionary is not included, improve the accuracy and efficiency that text entities extract.The embodiment of the present invention is also A kind of text entities extraction system is provided.
Description of the drawings
Fig. 1 is a kind of flow chart of text entities abstracting method provided in an embodiment of the present invention;
Fig. 2 is a kind of schematic diagram of text entities extraction system provided in an embodiment of the present invention.
Specific embodiment
Below in conjunction with the attached drawing in the embodiment of the present invention, the technical solution in the embodiment of the present invention is carried out clear, complete Site preparation describes, it is clear that described embodiment is only part of the embodiment of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, those of ordinary skill in the art are obtained every other without creative efforts Embodiment shall fall within the protection scope of the present invention.
Referring to Fig. 1, it is a kind of flow chart of text entities abstracting method provided in an embodiment of the present invention, the text Entity abstracting method includes:
S1:Acquire urtext;
S2:According to preset entity dictionary, the reality not being indexed in the entity dictionary is searched for from the urtext Pronouns, general term for nouns, numerals and measure words forms testing material collection;
S3:According to the testing material collection, the preset double-deck neural network extraction model of training;
S4:According to the preset double-deck neural network extraction model and the testing material collection, prediction novel entities word is simultaneously It will be in novel entities word update to the preset entity dictionary.
For example, by taking the analysis of news website as an example, crawl the text that the network platform is delivered, and to crawl text back into Row pretreatment, forms the testing material collection;The testing material collection is input to the preset double-deck neural network to extract In model, the preset double-deck neural network extraction model will concentrate automatic learning text feature from the testing material, the One hidden layer extracts the feature of each word, second hidden layer extraction feature from word window, and is regarded as a series of Part and global structure, and pass through the parameter that back-propagation algorithm trains the preset double-deck neural network extraction model.It is logical The preset double-deck neural network extraction model after training is crossed, can identify word or net that the preset entity dictionary is not included Network neologisms improve the accuracy and efficiency that text entities extract.It also is able to expand the entity dictionary automatically simultaneously, it is ensured that base In the accuracy of the big data analysis of the text entities abstracting method and comprehensive.
In a kind of optional embodiment, the text entities abstracting method further includes:
According to SVM complex nucleus fonction composition convolution kernel function and substance feature kernel function, entity word disaggregated model is established;
According to the entity word disaggregated model, classification annotation is carried out to the novel entities word;
According to preset loss function, the novel entities word is verified.
In the present embodiment, by constructing the preset loss function to being taken out by the preset double-deck neural network The novel entities word of model extraction is taken to be verified, avoids the problem that over-fitting.
In a kind of optional embodiment, S3:According to the testing material collection, the preset double-deck neural network of training extracts Model specifically includes:
According to Skip-gram algorithms and Bag-of-words algorithms, establish the preset double-deck neural network and extract mould Type;
It is calculated according to the testing material collection, the property parameters of the Skip-gram algorithms and the Bag-of-words The property parameters of method generate joint term vector;
According to the joint term vector, the preset double-deck neural network extraction model of training.
In a kind of optional embodiment, the text entities abstracting method further includes:
Noise reduction process is carried out to the urtext;
According to preset participle model, word segmentation processing is carried out to the urtext after noise reduction.
It is described according to preset participle model in a kind of optional embodiment, to the urtext after noise reduction into Row word segmentation processing, specifically includes:
According to MMseg partitioning algorithms and CRF distinguished numbers, the preset participle model is established;
The ambiguity word in the urtext after noise reduction is sentenced according to the CRF distinguished numbers of the preset participle model It does not analyze;
Cutting processing is carried out to the urtext after noise reduction according to the MMseg partitioning algorithms of the preset participle model.
In this embodiment, by establishing with reference to MMseg partitioning algorithms and the preset participle mould of CRF distinguished numbers Type can solve the ambiguity problem that text participle occurs in the process, reduce the training time of the default participle model and carry The rate of height participle.
It is described according to preset entity dictionary in a kind of optional embodiment, it searches for from the urtext and does not receive The entity word recorded in the entity dictionary forms testing material collection, specifically includes:
According to the preset entity dictionary, the primary entities that the entity dictionary is indexed in the urtext are identified Word;
According to the primary entities word, syntactic analysis, context analysis and probability are carried out to the urtext Statistics obtains the entity word not being indexed in the entity dictionary, and forms the testing material collection.
For example, according to primary entities word " mansion ", can be united according to syntactic analysis, context analysis and probability Meter, extracts " xx mansions ".
In a kind of optional embodiment, the preset double-deck neural network extraction model is:
Wherein, XnFor the joint term vector, ynFor the novel entities word of the prediction, N is the big of the testing material collection It is small;C is the parameter of softmax functions, and A is the term vector matrix of pre-training.
In a kind of optional embodiment, the entity word disaggregated model is:
Wherein, λ is weight coefficient, 0 < < 1;E1,E2For two novel entities words;SFT includes tree for shortest path;CTK is The convolution tree kernel function;Equal is the substance feature kernel function;E1·CiFor entity word E1I-th of class another characteristic, E2·CiFor entity word E2I-th of class another characteristic, work as E1When belonging to the i-th classification, E1.CiIt is 1, is otherwise 0;Work as E1.Ci, E2.CiWhen being 1 simultaneously, the value of Equal is 1, is otherwise 0;M is class number.
Referring to Fig. 2, it is a kind of schematic diagram of text entities extraction system provided in an embodiment of the present invention, the text Entity extraction system includes:
Text collection module 1, for acquiring urtext;
Testing material collection generation module 2, for according to preset entity dictionary, searching for from the urtext and not including Testing material collection is formed to the entity word in the entity dictionary;
Model training module 3, for according to the testing material collection, the preset double-deck neural network extraction model of training;
Entity word prediction module 4, for according to the preset double-deck neural network extraction model and the testing material Collection predicts novel entities word and updates the novel entities word into the preset entity dictionary.
For example, by taking the analysis of news website as an example, crawl the text that the network platform is delivered, and to crawl text back into Row pretreatment, forms the testing material collection;The testing material collection is input to the preset double-deck neural network to extract In model, the preset double-deck neural network extraction model will concentrate automatic learning text feature from the testing material, the One hidden layer extracts the feature of each word, second hidden layer extraction feature from word window, and is regarded as a series of Part and global structure, and pass through the parameter that back-propagation algorithm trains the preset double-deck neural network extraction model.It is logical The preset double-deck neural network extraction model after training is crossed, can identify word or net that the preset entity dictionary is not included Network neologisms improve the accuracy and efficiency that text entities extract.It also is able to expand the entity dictionary automatically simultaneously, it is ensured that base In the accuracy of the big data analysis of the text entities abstracting method and comprehensive.
In a kind of optional embodiment, the text entities extraction system further includes:
Disaggregated model establishes module, for according to SVM complex nucleus fonction composition convolution kernel function and substance feature kernel function, Establish entity word disaggregated model;
Classification annotation module, for according to the entity word disaggregated model, classification annotation to be carried out to the novel entities word;
Entity word authentication module, for according to preset loss function, being verified to the novel entities word.
In the present embodiment, by constructing the preset loss function to being taken out by the preset double-deck neural network The novel entities word of model extraction is taken to be verified, avoids the problem that over-fitting.
In a kind of optional embodiment, the model training module includes:
Establishment of Neural Model module, for according to Skip-gram algorithms and Bag-of-words algorithms, described in foundation Preset bilayer neural network extraction model;
Training term vector generation module, for the property parameters according to the testing material collection, the Skip-gram algorithms And the property parameters of the Bag-of-words algorithms, generate joint term vector;
Neural network model training module, for according to the joint term vector, the preset double-deck neural network of training to be taken out Modulus type.
In a kind of optional embodiment, the text entities extraction system further includes:
Text noise reduction module, for carrying out noise reduction process to the urtext;
Text word-dividing mode, for according to preset participle model, word segmentation processing to be carried out to the urtext after noise reduction.
In a kind of optional embodiment, the text word-dividing mode includes:
Participle model establishes module, for according to MMseg partitioning algorithms and CRF distinguished numbers, establishing described preset point Word model;
Ambiguity analysis module, for the CRF distinguished numbers according to the preset participle model to the original text after noise reduction Ambiguity word in this carries out discriminant analysis;
Text dividing module, for according to the MMseg partitioning algorithms of the preset participle model to original after noise reduction Text carries out cutting processing.
In this embodiment, by establishing with reference to MMseg partitioning algorithms and the preset participle mould of CRF distinguished numbers Type can solve the ambiguity problem that text participle occurs in the process, reduce the training time of the default participle model and carry The rate of height participle.
In a kind of optional embodiment, the testing material collection generation module includes:
Primary entities word identification module, for according to the preset entity dictionary, identifying and being included in the urtext To the primary entities word of the entity dictionary;
Testing material analysis module, for according to the primary entities word, the urtext is carried out syntactic analysis, on Hereafter scene analysis and probability statistics obtain the entity word not being indexed in the entity dictionary, and form the test language Material collection.
For example, according to primary entities word " mansion ", can be united according to syntactic analysis, context analysis and probability Meter, extracts " xx mansions ".
In a kind of optional embodiment, the preset double-deck neural network extraction model is:
Wherein, XnFor the joint term vector, ynFor the novel entities word of the prediction, N is the big of the testing material collection It is small;C is the parameter of softmax functions, and A is the term vector matrix of pre-training.
In a kind of optional embodiment, the entity word disaggregated model is:
Wherein, λ is weight coefficient, 0 < < 1;E1,E2For two novel entities words;SFT includes tree for shortest path;CTK is The convolution tree kernel function;Equal is the substance feature kernel function;E1·CiFor entity word E1I-th of class another characteristic, E2·CiFor entity word E2I-th of class another characteristic, work as E1When belonging to the i-th classification, E1.CiIt is 1, is otherwise 0;Work as E1.Ci, E2.CiWhen being 1 simultaneously, the value of Equal is 1, is otherwise 0;M is class number.
Opposite and the prior art, a kind of advantageous effect of text entities abstracting method provided in an embodiment of the present invention are: The text entities abstracting method includes acquisition urtext;According to preset entity dictionary, searched for from the urtext The entity word not being indexed in the entity dictionary forms testing material collection;According to the testing material collection, training is preset double Layer neural network extraction model;According to the preset double-deck neural network extraction model and the testing material collection, prediction is new Entity word simultaneously will be in novel entities word update to the preset entity dictionary.It can by the text entities abstracting method The word or network neologisms that identification dictionary is not included, improve the accuracy and efficiency that text entities extract.The embodiment of the present invention is also A kind of text entities extraction system is provided.
It is the preferred embodiment of the present invention above, it is noted that for those skilled in the art, Various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications are also considered as this hair Bright protection domain.

Claims (10)

1. a kind of text entities abstracting method, which is characterized in that including:
Acquire urtext;
According to preset entity dictionary, the entity word not being indexed in the entity dictionary is searched for from the urtext and is formed Testing material collection;
According to the testing material collection, the preset double-deck neural network extraction model of training;
According to the preset double-deck neural network extraction model and the testing material collection, prediction novel entities word simultaneously will be described new In entity word update to the preset entity dictionary.
2. text entities abstracting method as described in claim 1, which is characterized in that further include:
According to SVM complex nucleus fonction composition convolution kernel function and substance feature kernel function, entity word disaggregated model is established;
According to the entity word disaggregated model, classification annotation is carried out to the novel entities word;
According to preset loss function, the novel entities word is verified.
3. text entities abstracting method as described in claim 1, which is characterized in that described according to the testing material collection, instruction Practice preset double-deck neural network extraction model, specifically include:
According to Skip-gram algorithms and Bag-of-words algorithms, the preset double-deck neural network extraction model is established;
According to the testing material collection, the property parameters of the Skip-gram algorithms and the Bag-of-words algorithms Property parameters generate joint term vector;
According to the joint term vector, the preset double-deck neural network extraction model of training.
4. text entities abstracting method as described in claim 1, which is characterized in that further include:
Noise reduction process is carried out to the urtext;
According to preset participle model, word segmentation processing is carried out to the urtext after noise reduction.
5. text entities abstracting method as claimed in claim 4, which is characterized in that it is described according to preset participle model, it is right The urtext after noise reduction carries out word segmentation processing, specifically includes:
According to MMseg partitioning algorithms and CRF distinguished numbers, the preset participle model is established;
The ambiguity word in the urtext after noise reduction differentiate according to the CRF distinguished numbers of the preset participle model and is divided Analysis;
Cutting processing is carried out to the urtext after noise reduction according to the MMseg partitioning algorithms of the preset participle model.
6. text entities abstracting method as described in claim 1, which is characterized in that it is described according to preset entity dictionary, from The entity word not being indexed in the entity dictionary is searched in the urtext and forms testing material collection, is specifically included:
According to the preset entity dictionary, the primary entities word that the entity dictionary is indexed in the urtext is identified;
According to the primary entities word, syntactic analysis, context analysis and probability statistics are carried out to the urtext, The entity word not being indexed in the entity dictionary is obtained, and forms the testing material collection.
7. text entities abstracting method as claimed in claim 3, which is characterized in that the preset double-deck neural network extracts Model is:
Wherein, XnFor the joint term vector, ynFor the novel entities word of the prediction, N is the size of the testing material collection;C is The parameter of softmax functions, A are the term vector matrix of pre-training.
8. text entities abstracting method as claimed in claim 2, which is characterized in that the entity word disaggregated model is:
Wherein, λ is weight coefficient, 0 < λ < 1;E1,E2For two novel entities words;SFT includes tree for shortest path;CTK is described Convolution tree kernel function;Equal is the substance feature kernel function;E1·CiFor entity word E1I-th of class another characteristic, E2·Ci For entity word E2I-th of class another characteristic, work as E1When belonging to the i-th classification, E1.CiIt is 1, is otherwise 0;Work as E1.Ci,E2.CiSimultaneously When being 1, the value of Equal is 1, is otherwise 0;M is class number.
9. a kind of text entities extraction system, which is characterized in that including:
Text collection module, for acquiring urtext;
Testing material collection generation module, for according to preset entity dictionary, being searched for from the urtext and not being indexed to institute The entity word stated in entity dictionary forms testing material collection;
Model training module, for according to the testing material collection, the preset double-deck neural network extraction model of training;
Entity word prediction module, for according to the preset double-deck neural network extraction model and the testing material collection, in advance It surveys novel entities word and updates the novel entities word into the preset entity dictionary.
10. text entities extraction system as claimed in claim 9, which is characterized in that further include:
Disaggregated model establishes module, for according to SVM complex nucleus fonction composition convolution kernel function and substance feature kernel function, establishing Entity word disaggregated model;
Classification annotation module, for according to the entity word disaggregated model, classification annotation to be carried out to the novel entities word;
Entity word authentication module, for according to preset loss function, being verified to the novel entities word.
CN201711450896.3A 2017-12-27 2017-12-27 A kind of text entities abstracting method and system Pending CN108170678A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711450896.3A CN108170678A (en) 2017-12-27 2017-12-27 A kind of text entities abstracting method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711450896.3A CN108170678A (en) 2017-12-27 2017-12-27 A kind of text entities abstracting method and system

Publications (1)

Publication Number Publication Date
CN108170678A true CN108170678A (en) 2018-06-15

Family

ID=62518844

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711450896.3A Pending CN108170678A (en) 2017-12-27 2017-12-27 A kind of text entities abstracting method and system

Country Status (1)

Country Link
CN (1) CN108170678A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110134952A (en) * 2019-04-29 2019-08-16 华南师范大学 A kind of Error Text rejection method for identifying, device and storage medium
CN110941697A (en) * 2019-11-12 2020-03-31 清华大学 Method and system for detecting unrecorded terms
CN111324745A (en) * 2020-02-18 2020-06-23 深圳市一面网络技术有限公司 Word stock generation method and device
CN111611799A (en) * 2020-05-07 2020-09-01 北京智通云联科技有限公司 Dictionary and sequence labeling model based entity attribute extraction method, system and equipment
CN111950283A (en) * 2020-07-31 2020-11-17 合肥工业大学 Chinese word segmentation and named entity recognition system for large-scale medical text mining
CN112487807A (en) * 2020-12-09 2021-03-12 重庆邮电大学 Text relation extraction method based on expansion gate convolution neural network

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103544165A (en) * 2012-07-12 2014-01-29 腾讯科技(深圳)有限公司 Neologism mining method and system
CN104361010A (en) * 2014-10-11 2015-02-18 北京中搜网络技术股份有限公司 Automatic classification method for correcting news classification
CN105447206A (en) * 2016-01-05 2016-03-30 深圳市中易科技有限责任公司 New comment object identifying method and system based on word2vec algorithm
CN106033462A (en) * 2015-03-19 2016-10-19 科大讯飞股份有限公司 Neologism discovering method and system
CN106570179A (en) * 2016-11-10 2017-04-19 中国科学院信息工程研究所 Evaluative text-oriented kernel entity identification method and apparatus
CN106649250A (en) * 2015-10-29 2017-05-10 北京国双科技有限公司 Method and device for identifying emotional new words
US20170147910A1 (en) * 2015-10-02 2017-05-25 Baidu Usa Llc Systems and methods for fast novel visual concept learning from sentence descriptions of images
CN107092596A (en) * 2017-04-24 2017-08-25 重庆邮电大学 Text emotion analysis method based on attention CNNs and CCR
CN107301246A (en) * 2017-07-14 2017-10-27 河北工业大学 Chinese Text Categorization based on ultra-deep convolutional neural networks structural model
CN107480128A (en) * 2017-07-17 2017-12-15 广州特道信息科技有限公司 The segmenting method and device of Chinese text
CN107480197A (en) * 2017-07-17 2017-12-15 广州特道信息科技有限公司 Entity word recognition method and device

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103544165A (en) * 2012-07-12 2014-01-29 腾讯科技(深圳)有限公司 Neologism mining method and system
CN104361010A (en) * 2014-10-11 2015-02-18 北京中搜网络技术股份有限公司 Automatic classification method for correcting news classification
CN106033462A (en) * 2015-03-19 2016-10-19 科大讯飞股份有限公司 Neologism discovering method and system
US20170147910A1 (en) * 2015-10-02 2017-05-25 Baidu Usa Llc Systems and methods for fast novel visual concept learning from sentence descriptions of images
CN106649250A (en) * 2015-10-29 2017-05-10 北京国双科技有限公司 Method and device for identifying emotional new words
CN105447206A (en) * 2016-01-05 2016-03-30 深圳市中易科技有限责任公司 New comment object identifying method and system based on word2vec algorithm
CN106570179A (en) * 2016-11-10 2017-04-19 中国科学院信息工程研究所 Evaluative text-oriented kernel entity identification method and apparatus
CN107092596A (en) * 2017-04-24 2017-08-25 重庆邮电大学 Text emotion analysis method based on attention CNNs and CCR
CN107301246A (en) * 2017-07-14 2017-10-27 河北工业大学 Chinese Text Categorization based on ultra-deep convolutional neural networks structural model
CN107480128A (en) * 2017-07-17 2017-12-15 广州特道信息科技有限公司 The segmenting method and device of Chinese text
CN107480197A (en) * 2017-07-17 2017-12-15 广州特道信息科技有限公司 Entity word recognition method and device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
李慧: "词典与统计相结合的傣文分词方法与实现", 《中国优秀硕士学位论文全文数据库信息科技辑》 *
陈鹏: "基于多核融合的中文领域实体关系抽取研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110134952A (en) * 2019-04-29 2019-08-16 华南师范大学 A kind of Error Text rejection method for identifying, device and storage medium
CN110134952B (en) * 2019-04-29 2020-03-31 华南师范大学 Error text rejection method, device and storage medium
CN110941697A (en) * 2019-11-12 2020-03-31 清华大学 Method and system for detecting unrecorded terms
CN110941697B (en) * 2019-11-12 2023-08-08 清华大学 Method and system for detecting unrecorded terms
CN111324745A (en) * 2020-02-18 2020-06-23 深圳市一面网络技术有限公司 Word stock generation method and device
CN111611799A (en) * 2020-05-07 2020-09-01 北京智通云联科技有限公司 Dictionary and sequence labeling model based entity attribute extraction method, system and equipment
CN111611799B (en) * 2020-05-07 2023-06-02 北京智通云联科技有限公司 Entity attribute extraction method, system and equipment based on dictionary and sequence labeling model
CN111950283A (en) * 2020-07-31 2020-11-17 合肥工业大学 Chinese word segmentation and named entity recognition system for large-scale medical text mining
CN112487807A (en) * 2020-12-09 2021-03-12 重庆邮电大学 Text relation extraction method based on expansion gate convolution neural network
CN112487807B (en) * 2020-12-09 2023-07-28 重庆邮电大学 Text relation extraction method based on expansion gate convolutional neural network

Similar Documents

Publication Publication Date Title
CN108170678A (en) A kind of text entities abstracting method and system
CN108595708A (en) A kind of exception information file classification method of knowledge based collection of illustrative plates
CN108376131A (en) Keyword abstraction method based on seq2seq deep neural network models
CN110909164A (en) Text enhancement semantic classification method and system based on convolutional neural network
CN106951438A (en) A kind of event extraction system and method towards open field
CN110347894A (en) Knowledge mapping processing method, device, computer equipment and storage medium based on crawler
Rios-Alvarado et al. Learning concept hierarchies from textual resources for ontologies construction
CN104809176A (en) Entity relationship extracting method of Zang language
CN106599032A (en) Text event extraction method in combination of sparse coding and structural perceptron
CN107679110A (en) The method and device of knowledge mapping is improved with reference to text classification and picture attribute extraction
CN110457404A (en) Social media account-classification method based on complex heterogeneous network
CN103886020B (en) A kind of real estate information method for fast searching
CN106919652A (en) Short-sighted frequency automatic marking method and system based on multi-source various visual angles transductive learning
Kausar et al. ProSOUL: a framework to identify propaganda from online Urdu content
CN108304373A (en) Construction method, device, storage medium and the electronic device of semantic dictionary
CN105843796A (en) Microblog emotional tendency analysis method and device
Hassan et al. Sentiment analysis from images of natural disasters
Sherkat et al. Vector embedding of wikipedia concepts and entities
CN106503256B (en) A kind of hot information method for digging based on social networks document
CN105869058B (en) A kind of method that multilayer latent variable model user portrait extracts
CN110287341A (en) A kind of data processing method, device and readable storage medium storing program for executing
CN109472022A (en) New word identification method and terminal device based on machine learning
CN109472008A (en) A kind of Text similarity computing method, apparatus and electronic equipment
Amina et al. SCANCPECLENS: A framework for automatic lexicon generation and sentiment analysis of micro blogging data on China Pakistan economic corridor
CN114997288A (en) Design resource association method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
AD01 Patent right deemed abandoned
AD01 Patent right deemed abandoned

Effective date of abandoning: 20220809