CN109635254A - Paper duplicate checking method based on naive Bayesian, decision tree and SVM mixed model - Google Patents

Paper duplicate checking method based on naive Bayesian, decision tree and SVM mixed model Download PDF

Info

Publication number
CN109635254A
CN109635254A CN201811467956.7A CN201811467956A CN109635254A CN 109635254 A CN109635254 A CN 109635254A CN 201811467956 A CN201811467956 A CN 201811467956A CN 109635254 A CN109635254 A CN 109635254A
Authority
CN
China
Prior art keywords
decision tree
classification
class
keyword
svm
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811467956.7A
Other languages
Chinese (zh)
Inventor
廖勇
张笑颜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University
Original Assignee
Chongqing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University filed Critical Chongqing University
Priority to CN201811467956.7A priority Critical patent/CN109635254A/en
Publication of CN109635254A publication Critical patent/CN109635254A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/194Calculation of difference between files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods

Abstract

The present invention proposes a kind of paper duplicate checking method based on naive Bayesian, decision tree and SVM mixed model.Firstly, the frequency of occurrences with searching keyword establishes keyword database.Secondly, classifying to keyword.Furthermore the plagiarism type that first coarse screens determining article is carried out using decision tree and naive Bayesian fusion.Finally, learning in the case where classification standard can not be specified when using decision tree classification with SVM, riffle is formed.This patent is intended to improve current paper duplicate checking system, improves system for the accuracy of paper duplicate checking.

Description

Paper duplicate checking method based on naive Bayesian, decision tree and SVM mixed model
Technical field:
The present invention relates to a kind of text checking methods, and in particular to is based on naive Bayesian, decision tree and SVM mixed model Paper duplicate checking method.
Background technique:
Current internet is very flourishing, the research achievement for having many different researchers to upload on network.Present many positions, example Academic title's paper will be completed as teacher, doctor carry out academic title's competition, graduates' graduation is also required to finish one's graduation thesis, however There are many people to violate lowest permissible level of virtue, wherein in order to reach the research achievement that the personal purpose of oneself plagiarizes others.It is learned to hit Art is faked and academic improper behavior, paper duplicate checking software come into being.But this technology is complete not enough, the possibility of erroneous judgement Property is very high.There is also following Railway Projects for current paper duplicate checking system: (1) very tight for the duplicate checking technology of text in article Lattice, but the plagiarism of the central idea in article is but difficult to recognize.(2) inevitably occur in article some formula or Some knowledge class descriptions, these should not be calculated to plagiarize, but many duplicate checking systems are but judged to plagiarize now.(3) for plagiarizing The differentiation of type is unobvious, leads to not the plagiarism severity for judging author.For problem as above, art technology is needed Personnel solve.
Summary of the invention:
In view of the above-mentioned problems, the present invention proposes a kind of paper duplicate checking method.It is specific as follows:
1. the paper duplicate checking method based on naive Bayesian, decision tree and SVM mixed model, which is characterized in that including with Lower four steps:
S1 establishes keyword database with the frequency of occurrences of searching keyword;
S2 classifies to keyword;
S3 carries out preliminary coarse sizing using decision tree and naive Bayesian fusion;
S4 learns with SVM when can not specify classification standard when with decision tree classification, forms riffle.
2. the paper duplicate checking side according to claim 1 based on naive Bayesian, decision tree and SVM mixed model Method, which is characterized in that step S2 includes following sub-step:
S21 classifies keyword, is divided into innovation class and knowledge class;
S22, it is 40% that the repetitive rate of the keyword of knowledge class, which can be extended the deadline, but innovative keyword is tolerated Rate wants lower, is 5%;Can prevent in this way in duplicate checking article for some universal knowledeges utilization and caused by erroneous judgement.
3. the paper duplicate checking side according to claim 1 based on naive Bayesian, decision tree and SVM mixed model Method, which is characterized in that step S3 includes following sub-step:
S31: key index is extracted by detection chart, data, keyword, central idea;
S32: spearman rank correlation coefficient is selected to determine the correlation of index between any two, and to the correlation filtered out Property strong index dimensionality reduction carried out using Principal Component Analysis, reconfigure as one group of new generalized variable being independent of each other;
S33: beginning, six four sections of the interlude, concluding paragraph parts of article are chosen, power is analyzed using analytic hierarchy process (AHP) It is heavy, the integrated value of six parts is obtained after weighted comprehensive;The extracting method of interlude are as follows: if intermediate body part core views number Greater than four, then the section most by number of words in each core views, after its number of words is arranged from big to small, chooses highest four A section;If core views number equal to four, directly chooses the most section of the number of words in this four viewpoints;It is selected if less than four Preceding four paragraphs for taking the number of words in text after all paragraph number of words arrangements most;
S34: set of types will be plagiarized and be expressed as dependent variable, Criterion Attribute set representations are independent variable, with paragraph Criterion Attribute Six position integrated values and its corresponding type of plagiarizing are training sample, and CART is established by way of recursive subdivision to training sample Decision tree;
S35: counting CART decision tree and Bayesian model respectively and classify in the training process correct training sample number, It is the classification accuracy A of two algorithms divided by training sample sumCARTAnd ANB;And then it calculates decision-tree model and is copied respectively to all kinds of The training accuracy b (k), k=1,2 ... attacked, m, m are whole plagiarism type sums;Decision-tree model is defined to plagiarize in output Type is YtWhen be to the posterior probability of all kinds of plagiarisms
By the posterior probability P (Y of itself and Bayesian model outputk| X) NB weighted comprehensive, it obtains,
At this point, plagiarism type corresponding to obtained maximum probability is final classification output result.
4. a kind of paper duplicate checking based on naive Bayesian, decision tree and SVM mixed model according to claim 1 Method, which is characterized in that step S4 includes following sub-step:
S41: training sample set is generated, training sample is actively selected;I.e. on the various articles of training, C classification is drawn a circle to approve Training article collection I1, I2 ..., IC, are respectively sampled I1, I2 ..., IC using the method for uniform sampling, generate training Sample set I ' 1, I ' 2 ..., I ' C, the equal of article quantity of each sample set is using the plagiarism probability of article as sample vector;
S42: the class splitting scheme of node classifier is as follows:
Assuming that it is respectively in S1 and S2 that the positive counter-example class set that node classifier class divides, which closes respectively S1 and S2, N1 and N2, Classification number, C=N1+N2 are total classification number that the node need to divide, XjIndicate jth class sample set, j=1,2 ..., C, Xj's Number of samples is nj, sample vector x;
1) all kinds of centers is calculated
2) it sets i as class splitting scheme number, for all splitting schemes, according to step 3), 4) calculates
3) center of positive example and counter-example class set S1 and S2 is calculated:
Calculate the Euclidean distance between the center of S1 and S2:
di S1S2=| | e1 i-e2 i||
4) center to the center of S2 all kinds of in the average distance and S2 at center of the center all kinds of in S1 to S1 is calculated Average distance:
5) d is calculated according to the following formulai, the scheme being maximized is required scheme
di=dS1S2 i+dS1 i+dS2 i
According to node classifier class division methods presented hereinbefore, the class for designing each node classifier top-downly is divided Scheme finally establishes complete decision tree;
S43: training sample set I1 ', I2 ' ..., IC ' are utilized, each node classifier is trained, has been ultimately formed Whole SVM decision tree classifier;
S44: using whole pixels of image to be classified as test sample collection, test point is carried out with SVM decision tree classifier Classification results are mapped back image and realize image classification by class.
The beneficial effects of the present invention are: solving the subproblem in current paper duplicate checking system, the tool plagiarized has been refined Body situation.Using keyword classification and keyword repetitive rate inquiry reduce paper duplicate checking in may cause for knowledge type weight Multiple erroneous judgement merges the plagiarism type for establishing CART decision tree to judge paper using naive Bayesian and decision Tree algorithms, For the plagiarism type that cannot clearly classify, establishes SVM decision tree classifier using SVM and decision Tree algorithms fusion and divided Class further analyzes plagiarism degree.
Detailed description of the invention
Additional aspect of the invention and advantage will be apparent and hold from the description of the embodiment in conjunction with the following figures It is readily understood, in which:
Fig. 1 is overview flow chart of the present invention.
Specific embodiment
The embodiment of the present invention is described below in detail, examples of the embodiments are shown in the accompanying drawings, wherein from beginning to end Same or similar label indicates same or similar element or element with the same or similar functions.Below with reference to attached The embodiment of figure description is exemplary, and for explaining only the invention, and is not considered as limiting the invention.
In the description of the present invention, it is to be understood that, term " longitudinal direction ", " transverse direction ", "upper", "lower", "front", "rear", The orientation or positional relationship of the instructions such as "left", "right", "vertical", "horizontal", "top", "bottom" "inner", "outside" is based on attached drawing institute The orientation or positional relationship shown, is merely for convenience of description of the present invention and simplification of the description, rather than the dress of indication or suggestion meaning It sets or element must have a particular orientation, be constructed and operated in a specific orientation, therefore should not be understood as to limit of the invention System.
The present invention proposes a kind of paper duplicate checking method based on naive Bayesian, decision tree and SVM mixed model.By right Situation is plagiarized to paper in the fusion of naive Bayesian and decision Tree algorithms and carries out the determining plagiarism type of coarse sizing, then by certainly Plan tree and SVM algorithm fusion are classified further to the plagiarism type that can not classify.
In conjunction with attached drawing 1, the present invention is described in detail, mainly comprises the steps that
Step 1: starting.
Step 2: extracting keyword, detect keyword repetitive rate.
Keyword database is established with the frequency of occurrences of searching keyword, keyword is classified, is divided into innovation class With knowledge class.It is 40% that the repetitive rate of the keyword of knowledge class, which can be extended the deadline, but for innovative keyword tolerance rate It wants lower, is 5%;Can prevent in this way in duplicate checking article for some universal knowledeges utilization and caused by erroneous judgement.
Step 3: establishing CART decision tree.
Key index is extracted by detection chart, data, keyword, central idea.Select spearman rank correlation system Number is dropped the strong index of the correlation filtered out using Principal Component Analysis to determine the correlation of index between any two Dimension reconfigures as one group of new generalized variable being independent of each other.Choose beginning, four sections of the interlude, concluding paragraph six of article Part analyzes weight using analytic hierarchy process (AHP), the integrated value of six parts is obtained after weighted comprehensive;The extracting method of interlude Are as follows: if intermediate body part core views number is greater than four, by the most section of number of words in each core views, by its number of words After arranging from big to small, highest four sections are chosen;If core views number is equal to four, directly choose in this four viewpoints The most section of number of words;Preceding four paragraphs of number of words at most in text after all paragraph number of words arrangements are chosen if less than four. Set of types will be plagiarized and be expressed as dependent variable, Criterion Attribute set representations are independent variable, with six position integrated values of paragraph Criterion Attribute It is training sample with its corresponding type of plagiarizing, establishes CART decision tree by way of recursive subdivision to training sample.
Step 4: whether can judge to plagiarize type.
CART decision tree and Bayesian model is counted respectively to classify in the training process correct training sample number, divided by Training sample sum is the classification accuracy A of two algorithmsCARTAnd ANB;And then decision-tree model is calculated respectively to all kinds of plagiarisms Training accuracy b (k), k=1,2 ..., m, m are whole plagiarism type sums;It defines decision-tree model and plagiarizes type in output For YtWhen be to the posterior probability of all kinds of plagiarisms
By the posterior probability P (Y of itself and Bayesian model outputk|X)NBWeighted comprehensive obtains
At this point, plagiarism type corresponding to obtained maximum probability is final classification output result.
Step 5: forming SVM decision tree classifier.
Training sample set is generated, training sample is actively selected.I.e. on the various articles of training, the training of C classification is drawn a circle to approve Article collection I1, I2 ..., IC are sampled using method the difference I1, I2 ..., IC of uniform sampling, generate training sample set I ' 1, I ' 2 ..., I ' C, the article quantity of each sample set is equal, using the plagiarism probability of article as sample vector.Node classification The class splitting scheme of device is as follows:
Assuming that it is respectively in S1 and S2 that the positive counter-example class set that node classifier class divides, which closes respectively S1 and S2, N1 and N2, Classification number, C=N1+N2 are total classification number that the node need to divide, XjIndicate jth class sample set, j=1,2 ..., C, Xj's Number of samples is nj, sample vector x;
1) all kinds of centers is calculated
2) it sets i as class splitting scheme number, for all splitting schemes, according to step 3), 4) calculates
3) center of positive example and counter-example class set S1 and S2 is calculated:
Calculate the Euclidean distance between the center of S1 and S2:
di S1S2=||e1 i-e2 i||
4) center to the center of S2 all kinds of in the average distance and S2 at center of the center all kinds of in S1 to S1 is calculated Average distance:
5) di is calculated according to the following formula, and the scheme being maximized is required scheme
di=dS1S2 i+dS1 i+dS2 i
According to node classifier class division methods presented hereinbefore, the class for designing each node classifier top-downly is divided Scheme finally establishes complete decision tree;
Using training sample set I1 ', I2 ' ..., IC ', each node classifier is trained, is ultimately formed complete SVM decision tree classifier.Using whole pixels of image to be classified as test sample collection, surveyed with SVM decision tree classifier Classification results are mapped back image and realize image classification by examination classification.
Step 6: terminating.
Although an embodiment of the present invention has been shown and described, it will be understood by those skilled in the art that: not A variety of change, modification, replacement and modification can be carried out to these embodiments in the case where being detached from the principle of the present invention and objective, this The range of invention is defined by the claims and their equivalents.

Claims (4)

1. the paper duplicate checking method based on naive Bayesian, decision tree and SVM mixed model, which is characterized in that including following four A step:
S1 establishes keyword database with the frequency of occurrences of searching keyword;
S2 classifies to keyword;
S3 carries out preliminary coarse sizing using decision tree and naive Bayesian fusion;
S4 learns with SVM when can not specify classification standard when with decision tree classification, forms riffle.
2. the paper duplicate checking method according to claim 1 based on naive Bayesian, decision tree and SVM mixed model, It is characterized in that, step S2 includes following sub-step:
S21 classifies keyword, is divided into innovation class and knowledge class;
S22, it is 40% that the repetitive rate of the keyword of knowledge class, which can be extended the deadline, but innovative keyword tolerance rate is wanted It is lower, it is 5%;Can prevent in this way in duplicate checking article for some universal knowledeges utilization and caused by erroneous judgement.
3. the paper duplicate checking method according to claim 1 based on naive Bayesian, decision tree and SVM mixed model, It is characterized in that, step S3 includes following sub-step:
S31: key index is extracted by detection chart, data, keyword, central idea;
S32: spearman rank correlation coefficient is selected to determine the correlation of index between any two, and strong to the correlation filtered out Index dimensionality reduction is carried out using Principal Component Analysis, reconfigure as one group of new generalized variable being independent of each other;
S33: beginning, six four sections of the interlude, concluding paragraph parts of article are chosen, weight is analyzed using analytic hierarchy process (AHP), is added The integrated value of six parts is obtained after power is comprehensive;The extracting method of interlude are as follows: if intermediate body part core views number is greater than Four, then the section most by number of words in each core views, after its number of words is arranged from big to small, chooses highest four Section;If core views number equal to four, directly chooses the most section of the number of words in this four viewpoints;It is chosen if less than four Preceding four paragraphs of number of words at most in text after all paragraph number of words arrangements;
S34: set of types will be plagiarized and be expressed as dependent variable, Criterion Attribute set representations are independent variable, with six of paragraph Criterion Attribute Position integrated value and its corresponding type of plagiarizing are training sample, establish CART decision by way of recursive subdivision to training sample Tree;
S35: counting CART decision tree and Bayesian model respectively and classify in the training process correct training sample number, divided by Training sample sum is the classification accuracy A of two algorithmsCARTAnd ANB;And then decision-tree model is calculated respectively to all kinds of plagiarisms Training accuracy b (k), k=1,2 ..., m, m are whole plagiarism type sums;It defines decision-tree model and plagiarizes type in output For YtWhen be to the posterior probability of all kinds of plagiarisms
By the posterior probability P (Y of itself and Bayesian model outputk|X)NBWeighted comprehensive obtains,
At this point, plagiarism type corresponding to obtained maximum probability is final classification output result.
4. a kind of paper duplicate checking side based on naive Bayesian, decision tree and SVM mixed model according to claim 1 Method, which is characterized in that step S4 includes following sub-step:
S41: training sample set is generated, training sample is actively selected;I.e. on the various articles of training, the training of C classification is drawn a circle to approve Article collection I1, I2 ..., IC are respectively sampled I1, I2 ..., IC using the method for uniform sampling, generate training sample Collect I ' 1, I ' 2 ..., I ' C, the article quantity of each sample set is equal, using the plagiarism probability of article as sample vector;
S42: the class splitting scheme of node classifier is as follows:
Assuming that it is respectively the classification in S1 and S2 that the positive counter-example class set that node classifier class divides, which closes respectively S1 and S2, N1 and N2, Number, C=N1+N2 are total classification number that the node need to divide, XjIndicate jth class sample set, j=1,2 ..., C, XjSample Number is nj, sample vector x;
1) all kinds of centers is calculated
2) it sets i as class splitting scheme number, for all splitting schemes, according to step 3), 4) calculates
3) center of positive example and counter-example class set S1 and S2 is calculated:
Calculate the Euclidean distance between the center of S1 and S2:
di S1S2=| | e1 i-e2 i||
4) center all kinds of in the average distance and S2 at center of the center all kinds of in S1 to S1 is calculated to the flat of the center of S2 Equal distance:
5) d is calculated according to the following formulai, the scheme being maximized is required scheme
di=dS1S2 i+dS1 i+dS2 i
According to node classifier class division methods presented hereinbefore, the class division side of each node classifier is designed top-downly Case finally establishes complete decision tree;
S43: utilizing training sample set I1 ', I2 ' ..., IC ', be trained to each node classifier, ultimately forms complete SVM decision tree classifier;
S44: using whole pixels of image to be classified as test sample collection, carrying out testing classification with SVM decision tree classifier, Classification results map back image and realize image classification.
CN201811467956.7A 2018-12-03 2018-12-03 Paper duplicate checking method based on naive Bayesian, decision tree and SVM mixed model Pending CN109635254A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811467956.7A CN109635254A (en) 2018-12-03 2018-12-03 Paper duplicate checking method based on naive Bayesian, decision tree and SVM mixed model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811467956.7A CN109635254A (en) 2018-12-03 2018-12-03 Paper duplicate checking method based on naive Bayesian, decision tree and SVM mixed model

Publications (1)

Publication Number Publication Date
CN109635254A true CN109635254A (en) 2019-04-16

Family

ID=66070663

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811467956.7A Pending CN109635254A (en) 2018-12-03 2018-12-03 Paper duplicate checking method based on naive Bayesian, decision tree and SVM mixed model

Country Status (1)

Country Link
CN (1) CN109635254A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111367874A (en) * 2020-02-28 2020-07-03 北京神州绿盟信息安全科技股份有限公司 Log processing method, device, medium and equipment
CN111723208A (en) * 2020-06-28 2020-09-29 西南财经大学 Conditional classification tree-based legal decision document multi-classification method and device and terminal

Citations (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1804829A (en) * 2006-01-10 2006-07-19 西安交通大学 Semantic classification method for Chinese question
US20080195577A1 (en) * 2007-02-09 2008-08-14 Wei Fan Automatically and adaptively determining execution plans for queries with parameter markers
CN101441620A (en) * 2008-11-27 2009-05-27 温州大学 Electronic text document plagiarism recognition method based on similar string matching distance
CN101819601A (en) * 2010-05-11 2010-09-01 同方知网(北京)技术有限公司 Method for automatically classifying academic documents
CN101826263A (en) * 2009-03-04 2010-09-08 中国科学院自动化研究所 Objective standard based automatic oral evaluation system
CN103514170A (en) * 2012-06-20 2014-01-15 中国移动通信集团安徽有限公司 Speech-recognition text classification method and device
CN103544326A (en) * 2013-11-14 2014-01-29 上海交通大学 Chinese and English cross-language plagiarism recognition method based on characteristics and content of translations
US20140223284A1 (en) * 2013-02-01 2014-08-07 Brokersavant, Inc. Machine learning data annotation apparatuses, methods and systems
CN105045825A (en) * 2015-06-29 2015-11-11 中国地质大学(武汉) Structure extended polynomial naive Bayes text classification method
CN105447505A (en) * 2015-11-09 2016-03-30 成都数之联科技有限公司 Multilevel important email detection method
CN105468713A (en) * 2015-11-19 2016-04-06 西安交通大学 Multi-model fused short text classification method
CN105956382A (en) * 2016-04-26 2016-09-21 北京工商大学 Traditional Chinese medicine constitution optimized classification method based on improved CART decision-making tree and fuzzy naive Bayes combined model
CN107145514A (en) * 2017-04-01 2017-09-08 华南理工大学 Chinese sentence pattern sorting technique based on decision tree and SVM mixed models
CN107391772A (en) * 2017-09-15 2017-11-24 国网四川省电力公司眉山供电公司 A kind of file classification method based on naive Bayesian
CN107908715A (en) * 2017-11-10 2018-04-13 中国民航大学 Microblog emotional polarity discriminating method based on Adaboost and grader Weighted Fusion
CN107977670A (en) * 2017-10-09 2018-05-01 中国电子科技集团公司第二十八研究所 Accident classification stage division, the apparatus and system of decision tree and bayesian algorithm
US20180173847A1 (en) * 2016-12-16 2018-06-21 Jang-Jih Lu Establishing a machine learning model for cancer anticipation and a method of detecting cancer by using multiple tumor markers in the machine learning model for cancer anticipation
CN108763486A (en) * 2018-05-30 2018-11-06 湖南写邦科技有限公司 Paper duplicate checking method, terminal and storage medium based on terminal

Patent Citations (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1804829A (en) * 2006-01-10 2006-07-19 西安交通大学 Semantic classification method for Chinese question
US20080195577A1 (en) * 2007-02-09 2008-08-14 Wei Fan Automatically and adaptively determining execution plans for queries with parameter markers
CN101441620A (en) * 2008-11-27 2009-05-27 温州大学 Electronic text document plagiarism recognition method based on similar string matching distance
CN101826263A (en) * 2009-03-04 2010-09-08 中国科学院自动化研究所 Objective standard based automatic oral evaluation system
CN101819601A (en) * 2010-05-11 2010-09-01 同方知网(北京)技术有限公司 Method for automatically classifying academic documents
CN103514170A (en) * 2012-06-20 2014-01-15 中国移动通信集团安徽有限公司 Speech-recognition text classification method and device
US20140223284A1 (en) * 2013-02-01 2014-08-07 Brokersavant, Inc. Machine learning data annotation apparatuses, methods and systems
CN103544326A (en) * 2013-11-14 2014-01-29 上海交通大学 Chinese and English cross-language plagiarism recognition method based on characteristics and content of translations
CN105045825A (en) * 2015-06-29 2015-11-11 中国地质大学(武汉) Structure extended polynomial naive Bayes text classification method
CN105447505A (en) * 2015-11-09 2016-03-30 成都数之联科技有限公司 Multilevel important email detection method
CN105468713A (en) * 2015-11-19 2016-04-06 西安交通大学 Multi-model fused short text classification method
CN105956382A (en) * 2016-04-26 2016-09-21 北京工商大学 Traditional Chinese medicine constitution optimized classification method based on improved CART decision-making tree and fuzzy naive Bayes combined model
US20180173847A1 (en) * 2016-12-16 2018-06-21 Jang-Jih Lu Establishing a machine learning model for cancer anticipation and a method of detecting cancer by using multiple tumor markers in the machine learning model for cancer anticipation
CN107145514A (en) * 2017-04-01 2017-09-08 华南理工大学 Chinese sentence pattern sorting technique based on decision tree and SVM mixed models
CN107391772A (en) * 2017-09-15 2017-11-24 国网四川省电力公司眉山供电公司 A kind of file classification method based on naive Bayesian
CN107977670A (en) * 2017-10-09 2018-05-01 中国电子科技集团公司第二十八研究所 Accident classification stage division, the apparatus and system of decision tree and bayesian algorithm
CN107908715A (en) * 2017-11-10 2018-04-13 中国民航大学 Microblog emotional polarity discriminating method based on Adaboost and grader Weighted Fusion
CN108763486A (en) * 2018-05-30 2018-11-06 湖南写邦科技有限公司 Paper duplicate checking method, terminal and storage medium based on terminal

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
CHANCHANA SORNSOONTORN ET AL: ""Using Document Classification to Improve the Performance of a Plagiarism Checker:a Case for Thai language documents"", 《2017 21ST INTERNATIONAL COMPUTER SCIENCE AND ENGINEERING CONFERENCE(ICSEC)》 *
HADJ AHMED BOUARARA: ""Multi-Agents Machine Learning(MML) System for Plagiarism Detection"", 《MULTI-AGENTS MACHINE LEARNING(MML) SYSTEM FOR PLAGIARISM DETECTION》 *
PATIL SANGITAB ET AL: ""Use of Support Vector Machine, Decision Tree and Naive Bayesian Techniques for Wind Speed Classification"", 《2011 INTERNATIONAL CONFERENCE ON POWER AND ENERGY SYSTEMS》 *
王素红: ""基于SVM的抄袭检测研究"", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111367874A (en) * 2020-02-28 2020-07-03 北京神州绿盟信息安全科技股份有限公司 Log processing method, device, medium and equipment
CN111367874B (en) * 2020-02-28 2023-11-14 绿盟科技集团股份有限公司 Log processing method, device, medium and equipment
CN111723208A (en) * 2020-06-28 2020-09-29 西南财经大学 Conditional classification tree-based legal decision document multi-classification method and device and terminal
CN111723208B (en) * 2020-06-28 2023-04-18 西南财经大学 Conditional classification tree-based legal decision document multi-classification method and device and terminal

Similar Documents

Publication Publication Date Title
CN107577785B (en) Hierarchical multi-label classification method suitable for legal identification
Styawati et al. Sentiment analysis on online transportation reviews using Word2Vec text embedding model feature extraction and support vector machine (SVM) algorithm
Kuhkan A method to improve the accuracy of k-nearest neighbor algorithm
CN107798033B (en) Case text classification method in public security field
CN110222744A (en) A kind of Naive Bayes Classification Model improved method based on attribute weight
CN107861951A (en) Session subject identifying method in intelligent customer service
CN107220365A (en) Accurate commending system and method based on collaborative filtering and correlation rule parallel processing
CN107766418A (en) A kind of credit estimation method based on Fusion Model, electronic equipment and storage medium
CN107391772A (en) A kind of file classification method based on naive Bayesian
CN105975992A (en) Unbalanced data classification method based on adaptive upsampling
CN112100512A (en) Collaborative filtering recommendation method based on user clustering and project association analysis
CN106570076A (en) Computer text classification system
CN109344227A (en) Worksheet method, system and electronic equipment
CN110390816A (en) A kind of condition discrimination method based on multi-model fusion
CN103778206A (en) Method for providing network service resources
CN109635254A (en) Paper duplicate checking method based on naive Bayesian, decision tree and SVM mixed model
Arbel et al. Classifier evaluation under limited resources
CN112417082B (en) Scientific research achievement data disambiguation filing storage method
Řehůřek et al. Automated classification and categorization of mathematical knowledge
CN105160358B (en) A kind of image classification method and system
CN106775694A (en) A kind of hierarchy classification method of software merit rating code product
CN108268458A (en) A kind of semi-structured data sorting technique and device based on KNN algorithms
CN110309864B (en) Collaborative filtering recommendation method fusing local similarity and global similarity
CN114548104A (en) Few-sample entity identification method and model based on feature and category intervention
Zhang et al. Unbalanced data classification based on oversampling and integrated learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20190416