CN106951414A - A kind of academic text vocabulary identification of function method sorted based on machine learning - Google Patents
A kind of academic text vocabulary identification of function method sorted based on machine learning Download PDFInfo
- Publication number
- CN106951414A CN106951414A CN201710204292.4A CN201710204292A CN106951414A CN 106951414 A CN106951414 A CN 106951414A CN 201710204292 A CN201710204292 A CN 201710204292A CN 106951414 A CN106951414 A CN 106951414A
- Authority
- CN
- China
- Prior art keywords
- vocabulary
- text
- sequence
- feature
- word
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/253—Grammatical analysis; Style critique
Abstract
The invention discloses a kind of academic text vocabulary identification of function method sorted based on machine learning, including construction training data;Recognition methods based on sequence;Latent structure;Model training;The model obtained using training is ranked up to the sequence of words that documentation summary is included, and to the result of sequence generation, the result using top1 is of the invention to pass through the training set in structure as 5 steps such as result are extracted(18690 titles collected in CNKI databases meet the documentation summary data of AD HOC)Learning model, to test data(From ACM and ACL include document in extract and obtain 156 documents after screening)Comprising sequence of words be ranked up.Itself test result indicate that, identification paper key problem and core methed on have preferable recognition effect.
Description
Technical field
The invention belongs to intelligent identification technology field, more particularly to a kind of documentation level vocabulary work(sorted based on machine learning
Can automatic identifying method.
Background technology
The retrieval of existing INFORMATION and information management are primary concern is that the information of documentation level, on document representation
Use bag of words more.Such processing brings the facility on calculating, but is lost the deep layer language to academic text simultaneously
Reason and good sense solution, they can not answer the relevant content of academic documents and the more specifically problem of theme.Also, in the storage of academic documents
Today of unacceptable stage is all arrived with growth rate, traditional INFORMATION retrieval and information management have not had
Method is grasped to whole documents of subject, and it is also huge that this searches and read the pressure that document brings to scholars.
In existing directly related achievement in research, Ding is concerned about this topic, but Ding achievement is also simply mentioned to
The concept of vocabulary function, in-depth study achievement is not obtained, is not made a breakthrough on technical method yet.Other correlations
Research has occurred in that a large amount of achievements as information extraction, ontology knowledge base build research:Researcher knows around information extraction, body
Know storehouse structure and propose series of theories and technical research achievement, also occur in that technical products and the application of result of a large amount of maturations.
In general, existing achievement negligible amounts, there is also certain deficiency:(1) word of the existing achievement in research to academic text
The functional semantics framework that converges sets excessively simple, only gives the classification of two classes or the classification of three classes, it is impossible to cover in academic text
All functional attributes of vocabulary;(2) actual effect of existing recognition methods can be to ensure, the result reported from correlative theses
See, the performance and effect of recognition methods are all not enough, it is difficult to be put to actual semantic analysis application;(3) it is existing to study into
Fruit only identifies the function of vocabulary, but semantic relation vocabulary is not analysed in depth, the analysis result so obtained
Simply several isolated vocabulary, it is impossible to truly accomplishing the semantic understanding to text, commented for example, not only to obtain statement
Estimate the vocabulary (" recall rate " and " accuracy rate " in such as information retrieval) of index, in addition it is also necessary to obtain specific targets associated therewith
Numerical value.
The content of the invention
In order to solve the above problems, the present invention proposes a kind of documentation level vocabulary identification of function sorted based on machine learning
Method.
The technical solution adopted in the present invention is:A kind of academic text vocabulary identification of function side sorted based on machine learning
Method, it is characterised in that comprise the following steps:
Step 1:Construct training data;
Step 1.1:Some title forms are collected for " document of the Y " based on X, for every document, its English is inscribed
Name is converted into the representation of part of speech and frequent part of speech;
Step 1.2:By being counted to the text representation pattern after conversion, " Y " the category title moulds based on X are obtained
Formula;
Step 1.3:By being labeled the pattern obtained in step 1.2, obtain extracting problem and method from title
Text matches pattern;
Step 2:Recognition methods based on sequence;
Step 2.1:Given word combination P={ w1,w2,...,wmAnd annotation results sequence of words P '={ w '1,w
′2,...,w′n};Terminology extraction is carried out to text first by most long character string matching method, by being carried out on different grain size
Cutting, structural string cutting tree carries out synonymous conflation of words;After cutting tree merger, the character string that have matched in text is each being returned
It is removed in the bag of words of category, thus obtains P and P ' new expression PprocessedWith P 'processed;
Step 2.2:Using vocabulary is disabled, to PprocessedWith P 'processedIn vocabulary do stop words filtration treatment;
Step 2.3:Calculate P and P ' similarity score;
Step 3:Latent structure;
Include for sequence of words to be sorted construction feature:Lexical feature, syntactic feature and TextRank features;
Step 4:Model training;
Step 5:The model obtained using training is ranked up to the sequence of words that documentation summary is included, to sequence generation
As a result, the result using top1 is used as extraction result.
Relative to prior art, the beneficial effects of the invention are as follows the documentation level vocabulary function based on machine learning sequence is certainly
In dynamic recognition methods, by the way that in the training set of structure, (18690 titles collected in CNKI databases meet the text of AD HOC
Shelves summary data) learning model, to test data (from ACM and ACL include document in extract and obtain 156 texts after screening
Offer) sequence of words that includes is ranked up.Itself test result indicate that, identification paper key problem and core methed on have
Preferable recognition effect.
Brief description of the drawings
Fig. 1 is the character string cutting tree example of the embodiment of the present invention.
Embodiment
Understand for the ease of those of ordinary skill in the art and implement the present invention, below in conjunction with the accompanying drawings and embodiment is to this hair
It is bright to be described in further detail, it will be appreciated that implementation example described herein is merely to illustrate and explain the present invention, not
For limiting the present invention.
A kind of documentation level vocabulary identification of function method sorted based on machine learning that the present invention is provided, including following step
Suddenly:
Step 1, the construction of training data.The present embodiment is received from CNKI computer realm and figure feelings field journal data
88865 title forms are collected for " its Subject Title, for every document, is converted into part of speech and frequency by the document of the Y " based on X
The representation of numerous part of speech.
Building method is as follows:
Step 1.1, sentence s is expressed as sequence of words { w1,w2,…,wn, wiI-th of vocabulary in sentence is represented, n is s
Length.Frequent word lists F have recorded a series of previously given frequent vocabulary.By by all non-frequent vocabulary in s, i.e., not
Appear in the vocabulary in F and be substituted for the corresponding chunk of vocabulary (Chunk) mark, you can obtain sentence s corresponding based on frequent word
The text representation of item and part of speech.
For example, sentence " In this paper, we present a method for information
Retrieval. ", F is in, we, present, for, then the corresponding Star mode of sentence is " In NN, we present NN
for NN.”。
Step 1.2, by being counted to the text representation pattern after conversion, obtain that " Y " category titles based on X are the most
Common English Title pattern, is shown in Table 1.
Table 1 is the decimation pattern example of the embodiment of the present invention;
By the mark to above-mentioned pattern, it can obtain extracting the text matches pattern of problem and method from title, take out
Modulus formula mark example is shown in Table 2.
The decimation pattern of table 2 marks example
Using these patterns, corresponding word combination is extracted from the Subject Title of CNKI papers, and be these vocabulary groups
Conjunction is assigned to classification.By extracting, for obtaining key problem and core methed labeled data totally 18690.What these were extracted
Problem constitutes the key problem to place text and the mark of core methed with method data.
In order to illustrate these regular reliabilities and across source applicability, using the decimation rule shown in table 2 to ACM data
Storehouse include paper title carry out information extraction, if the title of these papers can matching template, export corresponding vocabulary
Sequence is used as recognition result.The extraction result of 1555 titles is randomly choosed during evaluation and test, artificial judgment extracts the accurate of result
Property.Evaluation result is shown:Key problem recognition accuracy is 99.55%;The accuracy rate that core methed is extracted is with evaluating standard
Change changed, if the method that the instrument mainly used in experiment is also regarded as solving the problems, such as, accuracy rate be for
98.65%, such as tool-class is foreclosed, then accuracy rate is 90.23%.
Step 2, the recognition methods based on sequence, the present embodiment uses the PairWise side in machine learning order models
Method.
Step 2.1, word combination P={ w are given1,w2,…,wmAnd annotation results sequence of words P '={ w1′,w2′,…,
wn′}.Terminology extraction is carried out to text first, the present embodiment has used most long character string matching method to extract term, by not
Cutting, structural string cutting tree are carried out in one-size.
For example, to text " support vector machine basedmethod ", it is assumed that there are term " support
Vector " and " support vector machine ", then can be with structural string cutting tree construction, as shown in Figure 1.
Step 2.2, after construction obtains the cutting tree representations of two character strings, ensuing calculating just based on two set into
OK.The Alphabetical List provided using synonymicon, two nodes of merger Income Maximum are carried out in two trees of selection every time
Merger, once some node is merged, then its father node and descendant nodes will be no longer participate in follow-up merger, so repeat, directly
It can be merged to no node.By cutting tree merger, the synonym of text pair can be matched, the word being matched
It is considered as synonymous vocabulary to accord with string, needs to be removed in the bag of words of respective ownership.Thus, P={ w are obtained1,w2,….,wm}
With P '={ w1′,w2′,…,wn' new expression Pprocessed={ w1,w2,….,wmAnd P 'processed={ w1′,w2′,…,
wn′}。
Step 2.3, in order to avoid the influence of noise vocabulary, further processing is also needed to the character string being converted to.One
A little vocabulary such as to, novel, one, a etc. need to be removed when calculating similitude, therefore, the present embodiment is to PprocessedWith
P′processedIn vocabulary do stop words filtration treatment.The present embodiment has used a deactivation vocabulary for including 561 stop words.
In whole matching process, in order to eliminate the influence that morphological change is calculated similarity score, matching treatment is after stem extraction
Text on carry out.
Step 2.4, P and P ' and corresponding P is givenprocessedWith P 'processed, similarity score employ one it is simple
Computational methods, computing formula is:
Wherein, | * | represent length.It can be seen that, this similarity measurements figureofmerit is asymmetric, that is to say, that sim
(P, P ') is not equal to sim (P ', P).If all vocabulary in P can be included semantically by P ', both similarities
For 1, if both constitute overlapping relation without any vocabulary or sequence of words, Similarity Measure result is 0.
Step 3, latent structure.The invention is that sequence of words to be sorted construction feature includes:Lexical feature, syntactic feature and
TextRank features.
Step 3.1, construction lexical feature, including combination in each vocabulary, the previous vocabulary of current vocabulary sequence, when
Latter vocabulary of preceding sequence of words, the first two vocabulary of current vocabulary combination, latter two vocabulary of current vocabulary combination and
The previous verb of current vocabulary.Whether particular text is included in sentence where treating ranked object, such as " this paper ",
" we ", " our work " etc., to the effect of sequence, there is also considerable influence.Therefore one 01 feature of construction is needed to mark the row for the treatment of
Whether ordered pair includes particular text as place sentence.
Step 3.2, syntactic feature is constructed, including:
1.Head vocabulary is recognized;
Vocabulary in word combination is added into directed networkses, built according to the dependence between vocabulary corresponding oriented
Side.As " an approach " construct a side from " approach " sensing " an ".Each node in traverses network, directly
All it is isolated node to them, finally return to "<MULI_HEAD>”.
2. vocabulary is to ROOT interdependent path;
Path using Head words to ROOT is as feature, and the result in path is output as (word1, Category1:
Relation:Category2,word2)+;Wherein word1, word2 are vocabulary texts, and Category1, Category2 are words
Property, Relation is word1 to word2 dependence, the multiple * of *+expression repetition;If including multiple Head vocabulary,
Interdependent path is not calculated, directly returns to " NOPATH ";
3. only record the interdependent paths of vocabulary-ROOT of verb node;
Method and an output ibid path, but only record verb.
4. the dependence feature of vocabulary direct correlation.The Head vocabulary of given vocabulary or word combination, is designated as word,
Word feature generation strategy is:Pair there is each dependence dependence tr for associate with word, because of the vocabulary of tr associations
Target is designated as, if word is that (this vocabulary is refer in Standfordparser to governer vocabulary in tr relations
Vocabulary), then return " tr:Target ", if target is governer vocabulary, returns to " tr-r:target”.Therefore, such as
There is n incidence relation in fruit word, then can form n feature.
Step 3.3, TextRank features are constructed.A construction of strategy moved based on window has been used to have no right undirected word altogether
Network, the TextRank values of sequence of words to be sorted in this basic calculation.
Step 4:Model training;
Meet the documentation summary data of AD HOC using 18690 titles collected from CNKI databases, will be from this
The problem of being extracted in a little documents and method are used as key problem and the natural annotation results of core methed.Order models training is used
SVM-Rank instruments, use SVMs order models training PairWise order models.The text that sequence study is used
Granularity is chunk (Chunk).In order to obtain chunk data, the present embodiment does syntax solution to text using Stanford Parser
The chunk included in analysis, and then the syntactic structure identification text obtained based on Stanford Parser.The present embodiment is used
OpenNLP carries out sentence cutting, and part-of-speech tagging is carried out to text using Stanford Postagger.Model training can be core
Problem and the respective independent order models of core methed generation.The sample and feature that the order models of two classifications are used all are one
Sample, difference is that sequence of each ordered samples under different classes of is different.Sequence of words and mesh in text is calculated
When marking the correlation of sequence of words, the present embodiment has used one to disable vocabulary comprising 561 vocabulary.Stem is extracted and used
PorterStemmer stem extracting tools.The document member that synonym vocabulary is included using the method for bilingual Chinese-English alignment from CNKI
Extracting data, altogether comprising 438968 synonyms pair.
Step 5:The model obtained using training is ranked up to the sequence of words that literature summary is included, to sequence generation
As a result, the result using top1 is used as extraction result.
The present embodiment in test phase, from ACM and ACL include document in randomly selected 200 documents, remove because of mark
44, the document (such as hardware classes Research Literature) that the limitation of personnel's research field can not be read, is obtained 156 test documents.Table 3
To use title the effect assessment result that rule and method is extracted.
Table 3 uses title the effect assessment result that rule and method is extracted
Evaluated and tested using the mode manually evaluated and tested, evaluation and test is primarily upon accuracy rate, recall rate.Some documents are not bright
True provides method/problem, and this kind of document is noted as no method/problem in mark;Table 4 is key problem and core methed
Recognition effect.
The key problem of table 4 and core methed recognition effect
From the experimental results, this method has certain validity in the key problem and core methed of identification paper.
It should be appreciated that the part that this specification is not elaborated belongs to prior art.
It should be appreciated that the above-mentioned description for preferred embodiment is more detailed, therefore it can not be considered to this
The limitation of invention patent protection scope, one of ordinary skill in the art is not departing from power of the present invention under the enlightenment of the present invention
Profit is required under protected ambit, can also be made replacement or be deformed, each fall within protection scope of the present invention, this hair
It is bright scope is claimed to be determined by the appended claims.
Claims (5)
1. a kind of academic text vocabulary identification of function method sorted based on machine learning, it is characterised in that comprise the following steps:
Step 1:Construct training data;
Step 1.1:Some title forms are collected for " document of the Y " based on X, for every document, its Subject Title is turned
Change the representation of part of speech and frequent part of speech into;
Step 1.2:By being counted to the text representation pattern after conversion, " Y " the category title patterns based on X are obtained;
Step 1.3:By being labeled to the pattern obtained in step 1.2, obtain extracting the text of problem and method from title
This match pattern;
Step 2:Recognition methods based on sequence;
Step 2.1:Given word combination P={ w1,w2,...,wmAnd annotation results sequence of words P '={ w '1,w′2,...,w
′n};Terminology extraction is carried out to text first by most long character string matching method, by carrying out cutting, structure on different grain size
Make character string cutting tree and carry out synonymous conflation of words;After cutting tree merger, the character string that have matched in text is in the word each belonged to
It is removed in bag, thus obtains P and P ' new expression PprocessedWith P 'processed;
Step 2.2:Using vocabulary is disabled, to PprocessedWith P 'processedIn vocabulary do stop words filtration treatment;
Step 2.3:Calculate P and P ' similarity score;
Step 3:Latent structure;
Include for sequence of words to be sorted construction feature:Lexical feature, syntactic feature and TextRank features;
Step 4:Model training;
Step 5:The model obtained using training is ranked up to the sequence of words that documentation summary is included, to the knot of sequence generation
Really, the result using top1 is used as extraction result.
2. the academic text vocabulary identification of function method according to claim 1 sorted based on machine learning, its feature is existed
In:For every document described in step 1.1, its Subject Title is converted into the representation of part of speech and frequent part of speech, first
Sentence s is expressed as sequence of words { w1,w2,…,wn, wiI-th of vocabulary in sentence is represented, n is s length;Frequent vocabulary row
Table F have recorded a series of previously given frequent vocabulary;By by all non-frequent vocabulary in s, that is, being not present in the vocabulary in F
It is substituted for the corresponding chunk Chunk marks of vocabulary, you can obtain the corresponding text tables based on frequent lexical item and part of speech of sentence s
Show.
3. the academic text vocabulary identification of function method according to claim 1 sorted based on machine learning, its feature is existed
In:P and P ' similarity score is calculated described in step 2.3, computing formula is:
Wherein, | * | represent length.
4. the academic text vocabulary identification of function method according to claim 1 sorted based on machine learning, its feature is existed
In implementing including following sub-step for, step 3:
Step 3.1:Each vocabulary, the previous vocabulary of current vocabulary sequence in construction lexical feature, including combination, current word
Converge latter vocabulary of sequence, the first two vocabulary of current vocabulary combination, latter two vocabulary of current vocabulary combination and current
The previous verb of vocabulary;
Step 3.2:Syntactic feature is constructed, including the identification of Head vocabulary, the interdependent path of vocabulary to ROOT, only records verb node
The interdependent paths of vocabulary-ROOT, the dependence feature of vocabulary direct correlation;
The Head vocabulary identification, adds directed networkses, according to the dependence structure between vocabulary by the vocabulary in word combination
The each node built in corresponding directed edge, traverses network, until they are isolated nodes, finally return to "<MULI_HEAD
>”;
The vocabulary is to ROOT interdependent path, and the path using Head words to ROOT is as feature, and the result in path is output as
(word1,Category1:Relation:Category2,word2)+;Wherein word1, word2 are vocabulary texts,
Category1, Category2 are parts of speech, and Relation is word1 to word2 dependence, the multiple * of *+expression repetition;
If including multiple Head vocabulary, interdependent path is not calculated, directly " NOPATH " is returned to;
The interdependent paths of vocabulary-ROOT for only recording verb node, method and output an ibid path, but only record dynamic
Word;
The dependence feature of the vocabulary direct correlation, gives the Head vocabulary of vocabulary or word combination, is designated as word,
Word feature generation strategy is:Pair there is each dependence tr for associate with word, because the vocabulary of tr associations is designated as
Target, if word is governer vocabulary in tr relations, returns to " tr:Target ", if target is
Governer vocabulary, returns to " tr-r:target”;Therefore, if word has n incidence relation, n feature can be formed;
Step 3.3:Construct TextRank features;
Have no right undirected co-word network using a construction of strategy moved based on window, the vocabulary sequence to be sorted in this basic calculation
The TextRank values of row.
5. the academic text vocabulary identification of function side sorted based on machine learning according to claim 1-4 any one
Method, it is characterised in that the process that implements of step 4 is:
Meet the documentation summary data of AD HOC using some titles, the problem of being extracted from these documents and method are made
For key problem and the natural annotation results of core methed;Order models training used SVM-Rank instruments, using support to
Amount machine order models train PairWise order models;The text granularity that order models are used is chunk, uses Stanford
Parser does the group included in syntax parsing, and then the syntactic structure identification text obtained based on StanfordParser to text
Block;Sentence cutting is carried out using OpenNLP, part-of-speech tagging is carried out to text using Stanford Postagger;Calculating text
In this during the correlation of sequence of words and target vocabulary sequence, vocabulary is disabled using vocabulary;Stem is extracted and used
PorterStemmer stem extracting tools;Synonym vocabulary is using the method for bilingual Chinese-English alignment from existing literature metadata
Extract.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710204292.4A CN106951414A (en) | 2017-03-30 | 2017-03-30 | A kind of academic text vocabulary identification of function method sorted based on machine learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710204292.4A CN106951414A (en) | 2017-03-30 | 2017-03-30 | A kind of academic text vocabulary identification of function method sorted based on machine learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106951414A true CN106951414A (en) | 2017-07-14 |
Family
ID=59475165
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710204292.4A Pending CN106951414A (en) | 2017-03-30 | 2017-03-30 | A kind of academic text vocabulary identification of function method sorted based on machine learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106951414A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108287825A (en) * | 2018-01-05 | 2018-07-17 | 中译语通科技股份有限公司 | A kind of term identification abstracting method and system |
CN109840327A (en) * | 2019-01-31 | 2019-06-04 | 北京嘉和美康信息技术有限公司 | A kind of vocabulary recognition methods and device |
CN112487134A (en) * | 2020-12-08 | 2021-03-12 | 武汉大学 | Scientific and technological text problem extraction method based on extremely simple abstract strategy |
CN112632606A (en) * | 2020-12-23 | 2021-04-09 | 天津理工大学 | SNOMED-CT-based medical text document desensitization method and system |
-
2017
- 2017-03-30 CN CN201710204292.4A patent/CN106951414A/en active Pending
Non-Patent Citations (1)
Title |
---|
程齐凯: ""学术文本的词汇功能识别"", 《中国博士学位论文全文数据库 哲学与人文科学辑》 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108287825A (en) * | 2018-01-05 | 2018-07-17 | 中译语通科技股份有限公司 | A kind of term identification abstracting method and system |
CN109840327A (en) * | 2019-01-31 | 2019-06-04 | 北京嘉和美康信息技术有限公司 | A kind of vocabulary recognition methods and device |
CN112487134A (en) * | 2020-12-08 | 2021-03-12 | 武汉大学 | Scientific and technological text problem extraction method based on extremely simple abstract strategy |
CN112632606A (en) * | 2020-12-23 | 2021-04-09 | 天津理工大学 | SNOMED-CT-based medical text document desensitization method and system |
CN112632606B (en) * | 2020-12-23 | 2022-12-09 | 天津理工大学 | SNOMED-CT-based medical text document desensitization method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105824933B (en) | Automatically request-answering system and its implementation based on main rheme | |
Alzahrani et al. | Understanding plagiarism linguistic patterns, textual features, and detection methods | |
CN106997382A (en) | Innovation intention label automatic marking method and system based on big data | |
Sathe et al. | Automated fact-checking of claims from Wikipedia | |
CN108121829A (en) | The domain knowledge collection of illustrative plates automated construction method of software-oriented defect | |
Fan et al. | Using syntactic and semantic relation analysis in question answering | |
CN103870506B (en) | Webpage information extraction method and system | |
CN106951414A (en) | A kind of academic text vocabulary identification of function method sorted based on machine learning | |
CN101702167A (en) | Method for extracting attribution and comment word with template based on internet | |
Diana et al. | Measuring performance of n-gram and Jaccard-similarity metrics in document plagiarism application | |
Derici et al. | A closed-domain question answering framework using reliable resources to assist students | |
Lahbari et al. | A rule-based method for Arabic question classification | |
Kamlangpuech et al. | A new system for analyzing contents of computer science courses | |
Walke et al. | Implementation approaches for various categories of question answering system | |
CN107818078B (en) | Semantic association and matching method for Chinese natural language dialogue | |
Greenwood et al. | Automatically acquiring a linguistically motivated genic interaction extraction system | |
Veena et al. | Semi supervised approach for relation extraction in agriculture documents | |
Sidhu et al. | Role of machine translation and word sense disambiguation in natural language processing | |
CN114297404A (en) | Knowledge graph construction method for field evaluation expert behavior track | |
Bhuiyan et al. | An effective approach to generate Wikipedia infobox of movie domain using semi-structured data | |
Fattoh et al. | Sematic attributes model for automatic generation of multiple choice questions | |
Rodrigues et al. | Rapport—a portuguese question-answering system | |
Ramachandran et al. | Document Clustering Using Keyword Extraction | |
Rodrigues et al. | Improving question-answering for Portuguese using triples extracted from corpora | |
Chen | Natural language processing in web data mining |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170714 |