CN108170678A - A kind of text entities abstracting method and system - Google Patents
A kind of text entities abstracting method and system Download PDFInfo
- Publication number
- CN108170678A CN108170678A CN201711450896.3A CN201711450896A CN108170678A CN 108170678 A CN108170678 A CN 108170678A CN 201711450896 A CN201711450896 A CN 201711450896A CN 108170678 A CN108170678 A CN 108170678A
- Authority
- CN
- China
- Prior art keywords
- word
- preset
- entity
- entities
- urtext
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
Abstract
The invention discloses a kind of text entities abstracting method and system, the text entities abstracting method includes acquisition urtext;According to preset entity dictionary, the entity word not being indexed in the entity dictionary is searched for from the urtext and forms testing material collection;According to the testing material collection, the preset double-deck neural network extraction model of training;According to the preset double-deck neural network extraction model and the testing material collection, predict novel entities word and update the novel entities word into the preset entity dictionary.The word or network neologisms that dictionary do not include can be identified by the text entities abstracting method, improve the accuracy and efficiency that text entities extract.
Description
Technical field
The present invention relates to natural language processing technique fields, and in particular to a kind of text entities abstracting method and system.
Background technology
With the continuous development of science and technology especially information technology, interpersonal exchange way is from simple
Face-to-face exchange is developed to more and more using " text " this linguistic form as information carrier.Example the most apparent is just
It is digital library and web page text.Unquestionably, can be that user's acquisition information is carried to effective management of these language resources
For very big facility.But with the development of network communication, the quantity of online available text information drastically expands, it might even be possible to say
It is that exponentially grade increases, if it is not only time-consuming and laborious to be classified by hand to these texts as before again, and accuracy rate
Also can not ensure.Therefore the text entities abstracting method based on natural language processing technique comes into being.
At present, the method that text entities extract can realize the text classification of magnanimity big data, at the same still information extraction,
Question answering system, syntactic analysis, machine translation, the important foundation towards application fields such as the metadata marks of Semantic Web.
Existing text entities abstracting method relies primarily on dictionary, word in matched text is identified based on dictionary, so as to obtain
Entity with dictionary matching, the entity in usual text mainly include name, place name, mechanism name, proper noun etc..But
Since existing text entities abstracting method excessively relies on dictionary, the word or network neologisms do not included for dictionary, it is impossible to identify
Out, the accuracy and efficiency that text entities extract is reduced.
Invention content
The object of the present invention is to provide a kind of text entities abstracting method and system, can identify word that dictionary do not include or
Network neologisms improve the accuracy and efficiency that text entities extract.
For solution more than technical problem, the embodiment of the present invention provides a kind of text entities abstracting method, including:
Acquire urtext;
According to preset entity dictionary, the entity word not being indexed in the entity dictionary is searched for from the urtext
Form testing material collection;
According to the testing material collection, the preset double-deck neural network extraction model of training;
According to the preset double-deck neural network extraction model and the testing material collection, novel entities word is predicted and by institute
It states in the update to the preset entity dictionary of novel entities word.
Preferably, the text entities abstracting method further includes:
According to SVM complex nucleus fonction composition convolution kernel function and substance feature kernel function, entity word disaggregated model is established;
According to the entity word disaggregated model, classification annotation is carried out to the novel entities word;
According to preset loss function, the novel entities word is verified.
Preferably, it is described according to the testing material collection, the preset double-deck neural network extraction model of training, specific packet
It includes:
According to Skip-gram algorithms and Bag-of-words algorithms, establish the preset double-deck neural network and extract mould
Type;
It is calculated according to the testing material collection, the property parameters of the Skip-gram algorithms and the Bag-of-words
The property parameters of method generate joint term vector;
According to the joint term vector, the preset double-deck neural network extraction model of training.
Preferably, the text entities abstracting method further includes:
Noise reduction process is carried out to the urtext;
According to preset participle model, word segmentation processing is carried out to the urtext after noise reduction.
Preferably, it is described according to preset participle model, word segmentation processing is carried out to the urtext after noise reduction, specifically
Including:
According to MMseg partitioning algorithms and CRF distinguished numbers, the preset participle model is established;
The ambiguity word in the urtext after noise reduction is sentenced according to the CRF distinguished numbers of the preset participle model
It does not analyze;
Cutting processing is carried out to the urtext after noise reduction according to the MMseg partitioning algorithms of the preset participle model.
Preferably, it is described according to preset entity dictionary, it is searched for from the urtext and is not indexed to the entity word
Entity word in library forms testing material collection, specifically includes:
According to the preset entity dictionary, the primary entities that the entity dictionary is indexed in the urtext are identified
Word;
According to the primary entities word, syntactic analysis, context analysis and probability are carried out to the urtext
Statistics obtains the entity word not being indexed in the entity dictionary, and forms the testing material collection.
Preferably, the preset double-deck neural network extraction model is:
Wherein, XnFor the joint term vector, ynFor the novel entities word of the prediction, N is the big of the testing material collection
It is small;C is the parameter of softmax functions, and A is the term vector matrix of pre-training.
Preferably, the entity word disaggregated model is:
Wherein, λ is weight coefficient, 0 < < 1;E1,E2For two novel entities words;SFT includes tree for shortest path;CTK is
The convolution tree kernel function;Equal is the substance feature kernel function;E1·CiFor entity word E1I-th of class another characteristic,
E2·CiFor entity word E2I-th of class another characteristic, work as E1When belonging to the i-th classification, E1.CiIt is 1, is otherwise 0;Work as E1.Ci,
E2.CiWhen being 1 simultaneously, the value of Equal is 1, is otherwise 0;M is class number.
The embodiment of the present invention further includes a kind of text entities extraction system, including:
Text collection module, for acquiring urtext;
Testing material collection generation module, for according to preset entity dictionary, searching for from the urtext and not including
Testing material collection is formed to the entity word in the entity dictionary;
Model training module, for according to the testing material collection, the preset double-deck neural network extraction model of training;
Entity word prediction module, for according to the preset double-deck neural network extraction model and the testing material
Collection predicts novel entities word and updates the novel entities word into the preset entity dictionary.
Preferably, the text entities extraction system further includes:
Disaggregated model establishes module, for according to SVM complex nucleus fonction composition convolution kernel function and substance feature kernel function,
Establish entity word disaggregated model;
Classification annotation module, for according to the entity word disaggregated model, classification annotation to be carried out to the novel entities word;
Entity word authentication module, for according to preset loss function, being verified to the novel entities word.
Opposite and the prior art, a kind of advantageous effect of text entities abstracting method provided in an embodiment of the present invention are:
The text entities abstracting method includes acquisition urtext;According to preset entity dictionary, searched for from the urtext
The entity word not being indexed in the entity dictionary forms testing material collection;According to the testing material collection, training is preset double
Layer neural network extraction model;According to the preset double-deck neural network extraction model and the testing material collection, prediction is new
Entity word simultaneously will be in novel entities word update to the preset entity dictionary.It can by the text entities abstracting method
The word or network neologisms that identification dictionary is not included, improve the accuracy and efficiency that text entities extract.The embodiment of the present invention is also
A kind of text entities extraction system is provided.
Description of the drawings
Fig. 1 is a kind of flow chart of text entities abstracting method provided in an embodiment of the present invention;
Fig. 2 is a kind of schematic diagram of text entities extraction system provided in an embodiment of the present invention.
Specific embodiment
Below in conjunction with the attached drawing in the embodiment of the present invention, the technical solution in the embodiment of the present invention is carried out clear, complete
Site preparation describes, it is clear that described embodiment is only part of the embodiment of the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, those of ordinary skill in the art are obtained every other without creative efforts
Embodiment shall fall within the protection scope of the present invention.
Referring to Fig. 1, it is a kind of flow chart of text entities abstracting method provided in an embodiment of the present invention, the text
Entity abstracting method includes:
S1:Acquire urtext;
S2:According to preset entity dictionary, the reality not being indexed in the entity dictionary is searched for from the urtext
Pronouns, general term for nouns, numerals and measure words forms testing material collection;
S3:According to the testing material collection, the preset double-deck neural network extraction model of training;
S4:According to the preset double-deck neural network extraction model and the testing material collection, prediction novel entities word is simultaneously
It will be in novel entities word update to the preset entity dictionary.
For example, by taking the analysis of news website as an example, crawl the text that the network platform is delivered, and to crawl text back into
Row pretreatment, forms the testing material collection;The testing material collection is input to the preset double-deck neural network to extract
In model, the preset double-deck neural network extraction model will concentrate automatic learning text feature from the testing material, the
One hidden layer extracts the feature of each word, second hidden layer extraction feature from word window, and is regarded as a series of
Part and global structure, and pass through the parameter that back-propagation algorithm trains the preset double-deck neural network extraction model.It is logical
The preset double-deck neural network extraction model after training is crossed, can identify word or net that the preset entity dictionary is not included
Network neologisms improve the accuracy and efficiency that text entities extract.It also is able to expand the entity dictionary automatically simultaneously, it is ensured that base
In the accuracy of the big data analysis of the text entities abstracting method and comprehensive.
In a kind of optional embodiment, the text entities abstracting method further includes:
According to SVM complex nucleus fonction composition convolution kernel function and substance feature kernel function, entity word disaggregated model is established;
According to the entity word disaggregated model, classification annotation is carried out to the novel entities word;
According to preset loss function, the novel entities word is verified.
In the present embodiment, by constructing the preset loss function to being taken out by the preset double-deck neural network
The novel entities word of model extraction is taken to be verified, avoids the problem that over-fitting.
In a kind of optional embodiment, S3:According to the testing material collection, the preset double-deck neural network of training extracts
Model specifically includes:
According to Skip-gram algorithms and Bag-of-words algorithms, establish the preset double-deck neural network and extract mould
Type;
It is calculated according to the testing material collection, the property parameters of the Skip-gram algorithms and the Bag-of-words
The property parameters of method generate joint term vector;
According to the joint term vector, the preset double-deck neural network extraction model of training.
In a kind of optional embodiment, the text entities abstracting method further includes:
Noise reduction process is carried out to the urtext;
According to preset participle model, word segmentation processing is carried out to the urtext after noise reduction.
It is described according to preset participle model in a kind of optional embodiment, to the urtext after noise reduction into
Row word segmentation processing, specifically includes:
According to MMseg partitioning algorithms and CRF distinguished numbers, the preset participle model is established;
The ambiguity word in the urtext after noise reduction is sentenced according to the CRF distinguished numbers of the preset participle model
It does not analyze;
Cutting processing is carried out to the urtext after noise reduction according to the MMseg partitioning algorithms of the preset participle model.
In this embodiment, by establishing with reference to MMseg partitioning algorithms and the preset participle mould of CRF distinguished numbers
Type can solve the ambiguity problem that text participle occurs in the process, reduce the training time of the default participle model and carry
The rate of height participle.
It is described according to preset entity dictionary in a kind of optional embodiment, it searches for from the urtext and does not receive
The entity word recorded in the entity dictionary forms testing material collection, specifically includes:
According to the preset entity dictionary, the primary entities that the entity dictionary is indexed in the urtext are identified
Word;
According to the primary entities word, syntactic analysis, context analysis and probability are carried out to the urtext
Statistics obtains the entity word not being indexed in the entity dictionary, and forms the testing material collection.
For example, according to primary entities word " mansion ", can be united according to syntactic analysis, context analysis and probability
Meter, extracts " xx mansions ".
In a kind of optional embodiment, the preset double-deck neural network extraction model is:
Wherein, XnFor the joint term vector, ynFor the novel entities word of the prediction, N is the big of the testing material collection
It is small;C is the parameter of softmax functions, and A is the term vector matrix of pre-training.
In a kind of optional embodiment, the entity word disaggregated model is:
Wherein, λ is weight coefficient, 0 < < 1;E1,E2For two novel entities words;SFT includes tree for shortest path;CTK is
The convolution tree kernel function;Equal is the substance feature kernel function;E1·CiFor entity word E1I-th of class another characteristic,
E2·CiFor entity word E2I-th of class another characteristic, work as E1When belonging to the i-th classification, E1.CiIt is 1, is otherwise 0;Work as E1.Ci,
E2.CiWhen being 1 simultaneously, the value of Equal is 1, is otherwise 0;M is class number.
Referring to Fig. 2, it is a kind of schematic diagram of text entities extraction system provided in an embodiment of the present invention, the text
Entity extraction system includes:
Text collection module 1, for acquiring urtext;
Testing material collection generation module 2, for according to preset entity dictionary, searching for from the urtext and not including
Testing material collection is formed to the entity word in the entity dictionary;
Model training module 3, for according to the testing material collection, the preset double-deck neural network extraction model of training;
Entity word prediction module 4, for according to the preset double-deck neural network extraction model and the testing material
Collection predicts novel entities word and updates the novel entities word into the preset entity dictionary.
For example, by taking the analysis of news website as an example, crawl the text that the network platform is delivered, and to crawl text back into
Row pretreatment, forms the testing material collection;The testing material collection is input to the preset double-deck neural network to extract
In model, the preset double-deck neural network extraction model will concentrate automatic learning text feature from the testing material, the
One hidden layer extracts the feature of each word, second hidden layer extraction feature from word window, and is regarded as a series of
Part and global structure, and pass through the parameter that back-propagation algorithm trains the preset double-deck neural network extraction model.It is logical
The preset double-deck neural network extraction model after training is crossed, can identify word or net that the preset entity dictionary is not included
Network neologisms improve the accuracy and efficiency that text entities extract.It also is able to expand the entity dictionary automatically simultaneously, it is ensured that base
In the accuracy of the big data analysis of the text entities abstracting method and comprehensive.
In a kind of optional embodiment, the text entities extraction system further includes:
Disaggregated model establishes module, for according to SVM complex nucleus fonction composition convolution kernel function and substance feature kernel function,
Establish entity word disaggregated model;
Classification annotation module, for according to the entity word disaggregated model, classification annotation to be carried out to the novel entities word;
Entity word authentication module, for according to preset loss function, being verified to the novel entities word.
In the present embodiment, by constructing the preset loss function to being taken out by the preset double-deck neural network
The novel entities word of model extraction is taken to be verified, avoids the problem that over-fitting.
In a kind of optional embodiment, the model training module includes:
Establishment of Neural Model module, for according to Skip-gram algorithms and Bag-of-words algorithms, described in foundation
Preset bilayer neural network extraction model;
Training term vector generation module, for the property parameters according to the testing material collection, the Skip-gram algorithms
And the property parameters of the Bag-of-words algorithms, generate joint term vector;
Neural network model training module, for according to the joint term vector, the preset double-deck neural network of training to be taken out
Modulus type.
In a kind of optional embodiment, the text entities extraction system further includes:
Text noise reduction module, for carrying out noise reduction process to the urtext;
Text word-dividing mode, for according to preset participle model, word segmentation processing to be carried out to the urtext after noise reduction.
In a kind of optional embodiment, the text word-dividing mode includes:
Participle model establishes module, for according to MMseg partitioning algorithms and CRF distinguished numbers, establishing described preset point
Word model;
Ambiguity analysis module, for the CRF distinguished numbers according to the preset participle model to the original text after noise reduction
Ambiguity word in this carries out discriminant analysis;
Text dividing module, for according to the MMseg partitioning algorithms of the preset participle model to original after noise reduction
Text carries out cutting processing.
In this embodiment, by establishing with reference to MMseg partitioning algorithms and the preset participle mould of CRF distinguished numbers
Type can solve the ambiguity problem that text participle occurs in the process, reduce the training time of the default participle model and carry
The rate of height participle.
In a kind of optional embodiment, the testing material collection generation module includes:
Primary entities word identification module, for according to the preset entity dictionary, identifying and being included in the urtext
To the primary entities word of the entity dictionary;
Testing material analysis module, for according to the primary entities word, the urtext is carried out syntactic analysis, on
Hereafter scene analysis and probability statistics obtain the entity word not being indexed in the entity dictionary, and form the test language
Material collection.
For example, according to primary entities word " mansion ", can be united according to syntactic analysis, context analysis and probability
Meter, extracts " xx mansions ".
In a kind of optional embodiment, the preset double-deck neural network extraction model is:
Wherein, XnFor the joint term vector, ynFor the novel entities word of the prediction, N is the big of the testing material collection
It is small;C is the parameter of softmax functions, and A is the term vector matrix of pre-training.
In a kind of optional embodiment, the entity word disaggregated model is:
Wherein, λ is weight coefficient, 0 < < 1;E1,E2For two novel entities words;SFT includes tree for shortest path;CTK is
The convolution tree kernel function;Equal is the substance feature kernel function;E1·CiFor entity word E1I-th of class another characteristic,
E2·CiFor entity word E2I-th of class another characteristic, work as E1When belonging to the i-th classification, E1.CiIt is 1, is otherwise 0;Work as E1.Ci,
E2.CiWhen being 1 simultaneously, the value of Equal is 1, is otherwise 0;M is class number.
Opposite and the prior art, a kind of advantageous effect of text entities abstracting method provided in an embodiment of the present invention are:
The text entities abstracting method includes acquisition urtext;According to preset entity dictionary, searched for from the urtext
The entity word not being indexed in the entity dictionary forms testing material collection;According to the testing material collection, training is preset double
Layer neural network extraction model;According to the preset double-deck neural network extraction model and the testing material collection, prediction is new
Entity word simultaneously will be in novel entities word update to the preset entity dictionary.It can by the text entities abstracting method
The word or network neologisms that identification dictionary is not included, improve the accuracy and efficiency that text entities extract.The embodiment of the present invention is also
A kind of text entities extraction system is provided.
It is the preferred embodiment of the present invention above, it is noted that for those skilled in the art,
Various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications are also considered as this hair
Bright protection domain.
Claims (10)
1. a kind of text entities abstracting method, which is characterized in that including:
Acquire urtext;
According to preset entity dictionary, the entity word not being indexed in the entity dictionary is searched for from the urtext and is formed
Testing material collection;
According to the testing material collection, the preset double-deck neural network extraction model of training;
According to the preset double-deck neural network extraction model and the testing material collection, prediction novel entities word simultaneously will be described new
In entity word update to the preset entity dictionary.
2. text entities abstracting method as described in claim 1, which is characterized in that further include:
According to SVM complex nucleus fonction composition convolution kernel function and substance feature kernel function, entity word disaggregated model is established;
According to the entity word disaggregated model, classification annotation is carried out to the novel entities word;
According to preset loss function, the novel entities word is verified.
3. text entities abstracting method as described in claim 1, which is characterized in that described according to the testing material collection, instruction
Practice preset double-deck neural network extraction model, specifically include:
According to Skip-gram algorithms and Bag-of-words algorithms, the preset double-deck neural network extraction model is established;
According to the testing material collection, the property parameters of the Skip-gram algorithms and the Bag-of-words algorithms
Property parameters generate joint term vector;
According to the joint term vector, the preset double-deck neural network extraction model of training.
4. text entities abstracting method as described in claim 1, which is characterized in that further include:
Noise reduction process is carried out to the urtext;
According to preset participle model, word segmentation processing is carried out to the urtext after noise reduction.
5. text entities abstracting method as claimed in claim 4, which is characterized in that it is described according to preset participle model, it is right
The urtext after noise reduction carries out word segmentation processing, specifically includes:
According to MMseg partitioning algorithms and CRF distinguished numbers, the preset participle model is established;
The ambiguity word in the urtext after noise reduction differentiate according to the CRF distinguished numbers of the preset participle model and is divided
Analysis;
Cutting processing is carried out to the urtext after noise reduction according to the MMseg partitioning algorithms of the preset participle model.
6. text entities abstracting method as described in claim 1, which is characterized in that it is described according to preset entity dictionary, from
The entity word not being indexed in the entity dictionary is searched in the urtext and forms testing material collection, is specifically included:
According to the preset entity dictionary, the primary entities word that the entity dictionary is indexed in the urtext is identified;
According to the primary entities word, syntactic analysis, context analysis and probability statistics are carried out to the urtext,
The entity word not being indexed in the entity dictionary is obtained, and forms the testing material collection.
7. text entities abstracting method as claimed in claim 3, which is characterized in that the preset double-deck neural network extracts
Model is:
Wherein, XnFor the joint term vector, ynFor the novel entities word of the prediction, N is the size of the testing material collection;C is
The parameter of softmax functions, A are the term vector matrix of pre-training.
8. text entities abstracting method as claimed in claim 2, which is characterized in that the entity word disaggregated model is:
Wherein, λ is weight coefficient, 0 < λ < 1;E1,E2For two novel entities words;SFT includes tree for shortest path;CTK is described
Convolution tree kernel function;Equal is the substance feature kernel function;E1·CiFor entity word E1I-th of class another characteristic, E2·Ci
For entity word E2I-th of class another characteristic, work as E1When belonging to the i-th classification, E1.CiIt is 1, is otherwise 0;Work as E1.Ci,E2.CiSimultaneously
When being 1, the value of Equal is 1, is otherwise 0;M is class number.
9. a kind of text entities extraction system, which is characterized in that including:
Text collection module, for acquiring urtext;
Testing material collection generation module, for according to preset entity dictionary, being searched for from the urtext and not being indexed to institute
The entity word stated in entity dictionary forms testing material collection;
Model training module, for according to the testing material collection, the preset double-deck neural network extraction model of training;
Entity word prediction module, for according to the preset double-deck neural network extraction model and the testing material collection, in advance
It surveys novel entities word and updates the novel entities word into the preset entity dictionary.
10. text entities extraction system as claimed in claim 9, which is characterized in that further include:
Disaggregated model establishes module, for according to SVM complex nucleus fonction composition convolution kernel function and substance feature kernel function, establishing
Entity word disaggregated model;
Classification annotation module, for according to the entity word disaggregated model, classification annotation to be carried out to the novel entities word;
Entity word authentication module, for according to preset loss function, being verified to the novel entities word.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711450896.3A CN108170678A (en) | 2017-12-27 | 2017-12-27 | A kind of text entities abstracting method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711450896.3A CN108170678A (en) | 2017-12-27 | 2017-12-27 | A kind of text entities abstracting method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108170678A true CN108170678A (en) | 2018-06-15 |
Family
ID=62518844
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711450896.3A Pending CN108170678A (en) | 2017-12-27 | 2017-12-27 | A kind of text entities abstracting method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108170678A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110134952A (en) * | 2019-04-29 | 2019-08-16 | 华南师范大学 | A kind of Error Text rejection method for identifying, device and storage medium |
CN110941697A (en) * | 2019-11-12 | 2020-03-31 | 清华大学 | Method and system for detecting unrecorded terms |
CN111324745A (en) * | 2020-02-18 | 2020-06-23 | 深圳市一面网络技术有限公司 | Word stock generation method and device |
CN111611799A (en) * | 2020-05-07 | 2020-09-01 | 北京智通云联科技有限公司 | Dictionary and sequence labeling model based entity attribute extraction method, system and equipment |
CN111950283A (en) * | 2020-07-31 | 2020-11-17 | 合肥工业大学 | Chinese word segmentation and named entity recognition system for large-scale medical text mining |
CN112487807A (en) * | 2020-12-09 | 2021-03-12 | 重庆邮电大学 | Text relation extraction method based on expansion gate convolution neural network |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103544165A (en) * | 2012-07-12 | 2014-01-29 | 腾讯科技(深圳)有限公司 | Neologism mining method and system |
CN104361010A (en) * | 2014-10-11 | 2015-02-18 | 北京中搜网络技术股份有限公司 | Automatic classification method for correcting news classification |
CN105447206A (en) * | 2016-01-05 | 2016-03-30 | 深圳市中易科技有限责任公司 | New comment object identifying method and system based on word2vec algorithm |
CN106033462A (en) * | 2015-03-19 | 2016-10-19 | 科大讯飞股份有限公司 | Neologism discovering method and system |
CN106570179A (en) * | 2016-11-10 | 2017-04-19 | 中国科学院信息工程研究所 | Evaluative text-oriented kernel entity identification method and apparatus |
CN106649250A (en) * | 2015-10-29 | 2017-05-10 | 北京国双科技有限公司 | Method and device for identifying emotional new words |
US20170147910A1 (en) * | 2015-10-02 | 2017-05-25 | Baidu Usa Llc | Systems and methods for fast novel visual concept learning from sentence descriptions of images |
CN107092596A (en) * | 2017-04-24 | 2017-08-25 | 重庆邮电大学 | Text emotion analysis method based on attention CNNs and CCR |
CN107301246A (en) * | 2017-07-14 | 2017-10-27 | 河北工业大学 | Chinese Text Categorization based on ultra-deep convolutional neural networks structural model |
CN107480128A (en) * | 2017-07-17 | 2017-12-15 | 广州特道信息科技有限公司 | The segmenting method and device of Chinese text |
CN107480197A (en) * | 2017-07-17 | 2017-12-15 | 广州特道信息科技有限公司 | Entity word recognition method and device |
-
2017
- 2017-12-27 CN CN201711450896.3A patent/CN108170678A/en active Pending
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103544165A (en) * | 2012-07-12 | 2014-01-29 | 腾讯科技(深圳)有限公司 | Neologism mining method and system |
CN104361010A (en) * | 2014-10-11 | 2015-02-18 | 北京中搜网络技术股份有限公司 | Automatic classification method for correcting news classification |
CN106033462A (en) * | 2015-03-19 | 2016-10-19 | 科大讯飞股份有限公司 | Neologism discovering method and system |
US20170147910A1 (en) * | 2015-10-02 | 2017-05-25 | Baidu Usa Llc | Systems and methods for fast novel visual concept learning from sentence descriptions of images |
CN106649250A (en) * | 2015-10-29 | 2017-05-10 | 北京国双科技有限公司 | Method and device for identifying emotional new words |
CN105447206A (en) * | 2016-01-05 | 2016-03-30 | 深圳市中易科技有限责任公司 | New comment object identifying method and system based on word2vec algorithm |
CN106570179A (en) * | 2016-11-10 | 2017-04-19 | 中国科学院信息工程研究所 | Evaluative text-oriented kernel entity identification method and apparatus |
CN107092596A (en) * | 2017-04-24 | 2017-08-25 | 重庆邮电大学 | Text emotion analysis method based on attention CNNs and CCR |
CN107301246A (en) * | 2017-07-14 | 2017-10-27 | 河北工业大学 | Chinese Text Categorization based on ultra-deep convolutional neural networks structural model |
CN107480128A (en) * | 2017-07-17 | 2017-12-15 | 广州特道信息科技有限公司 | The segmenting method and device of Chinese text |
CN107480197A (en) * | 2017-07-17 | 2017-12-15 | 广州特道信息科技有限公司 | Entity word recognition method and device |
Non-Patent Citations (2)
Title |
---|
李慧: "词典与统计相结合的傣文分词方法与实现", 《中国优秀硕士学位论文全文数据库信息科技辑》 * |
陈鹏: "基于多核融合的中文领域实体关系抽取研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 * |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110134952A (en) * | 2019-04-29 | 2019-08-16 | 华南师范大学 | A kind of Error Text rejection method for identifying, device and storage medium |
CN110134952B (en) * | 2019-04-29 | 2020-03-31 | 华南师范大学 | Error text rejection method, device and storage medium |
CN110941697A (en) * | 2019-11-12 | 2020-03-31 | 清华大学 | Method and system for detecting unrecorded terms |
CN110941697B (en) * | 2019-11-12 | 2023-08-08 | 清华大学 | Method and system for detecting unrecorded terms |
CN111324745A (en) * | 2020-02-18 | 2020-06-23 | 深圳市一面网络技术有限公司 | Word stock generation method and device |
CN111611799A (en) * | 2020-05-07 | 2020-09-01 | 北京智通云联科技有限公司 | Dictionary and sequence labeling model based entity attribute extraction method, system and equipment |
CN111611799B (en) * | 2020-05-07 | 2023-06-02 | 北京智通云联科技有限公司 | Entity attribute extraction method, system and equipment based on dictionary and sequence labeling model |
CN111950283A (en) * | 2020-07-31 | 2020-11-17 | 合肥工业大学 | Chinese word segmentation and named entity recognition system for large-scale medical text mining |
CN112487807A (en) * | 2020-12-09 | 2021-03-12 | 重庆邮电大学 | Text relation extraction method based on expansion gate convolution neural network |
CN112487807B (en) * | 2020-12-09 | 2023-07-28 | 重庆邮电大学 | Text relation extraction method based on expansion gate convolutional neural network |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108170678A (en) | A kind of text entities abstracting method and system | |
CN108595708A (en) | A kind of exception information file classification method of knowledge based collection of illustrative plates | |
CN108376131A (en) | Keyword abstraction method based on seq2seq deep neural network models | |
CN110909164A (en) | Text enhancement semantic classification method and system based on convolutional neural network | |
CN106951438A (en) | A kind of event extraction system and method towards open field | |
CN110347894A (en) | Knowledge mapping processing method, device, computer equipment and storage medium based on crawler | |
Rios-Alvarado et al. | Learning concept hierarchies from textual resources for ontologies construction | |
CN104809176A (en) | Entity relationship extracting method of Zang language | |
CN106599032A (en) | Text event extraction method in combination of sparse coding and structural perceptron | |
CN107679110A (en) | The method and device of knowledge mapping is improved with reference to text classification and picture attribute extraction | |
CN110457404A (en) | Social media account-classification method based on complex heterogeneous network | |
CN103886020B (en) | A kind of real estate information method for fast searching | |
CN106919652A (en) | Short-sighted frequency automatic marking method and system based on multi-source various visual angles transductive learning | |
Kausar et al. | ProSOUL: a framework to identify propaganda from online Urdu content | |
CN108304373A (en) | Construction method, device, storage medium and the electronic device of semantic dictionary | |
CN105843796A (en) | Microblog emotional tendency analysis method and device | |
Hassan et al. | Sentiment analysis from images of natural disasters | |
Sherkat et al. | Vector embedding of wikipedia concepts and entities | |
CN106503256B (en) | A kind of hot information method for digging based on social networks document | |
CN105869058B (en) | A kind of method that multilayer latent variable model user portrait extracts | |
CN110287341A (en) | A kind of data processing method, device and readable storage medium storing program for executing | |
CN109472022A (en) | New word identification method and terminal device based on machine learning | |
CN109472008A (en) | A kind of Text similarity computing method, apparatus and electronic equipment | |
Amina et al. | SCANCPECLENS: A framework for automatic lexicon generation and sentiment analysis of micro blogging data on China Pakistan economic corridor | |
CN114997288A (en) | Design resource association method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
AD01 | Patent right deemed abandoned | ||
AD01 | Patent right deemed abandoned |
Effective date of abandoning: 20220809 |