CN111078874B - Foreign Chinese difficulty assessment method based on decision tree classification of random subspace - Google Patents

Foreign Chinese difficulty assessment method based on decision tree classification of random subspace Download PDF

Info

Publication number
CN111078874B
CN111078874B CN201911206414.9A CN201911206414A CN111078874B CN 111078874 B CN111078874 B CN 111078874B CN 201911206414 A CN201911206414 A CN 201911206414A CN 111078874 B CN111078874 B CN 111078874B
Authority
CN
China
Prior art keywords
chinese
article
svm
decision tree
foreign
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911206414.9A
Other languages
Chinese (zh)
Other versions
CN111078874A (en
Inventor
曾致中
陈治平
余新国
方淙
王静静
袁航
熊佳洁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Central China Normal University
Original Assignee
Central China Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Central China Normal University filed Critical Central China Normal University
Priority to CN201911206414.9A priority Critical patent/CN111078874B/en
Publication of CN111078874A publication Critical patent/CN111078874A/en
Application granted granted Critical
Publication of CN111078874B publication Critical patent/CN111078874B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a method for evaluating the difficulty of Chinese to the exterior of decision tree classification based on random subspace feature selection of svm and bert models, which comprises the steps of generating 86 statistical features according to the characteristics of the length, the readability and the like of an article, and classifying by using svm to obtain a confidence coefficient 1. And classifying the encoding characteristics by using svm to obtain a confidence coefficient 2. And fusing the obtained 2 confidence coefficients to be used as new features, and classifying by using a decision tree. For encoding feature data: the result of encoding-1 layer output information is extracted through a BERT model, and then average- > max posing processing is carried out to obtain total 768-dimensional characteristics without normalization. The method avoids the problems of low efficiency and under-fitting of the traditional algorithm, and uses all information most reasonably, so that the effect of increasing the classification basis is obvious. The method achieves 85.6% of accuracy rate on the evaluation of the difficulty of Chinese language.

Description

Foreign Chinese difficulty assessment method based on random subspace decision tree classification
Technical Field
The invention belongs to the field of education informatization, and particularly relates to a decision tree classification external Chinese difficulty assessment method based on random subspace feature selection of svm and bert models.
Background
It is well known that reading should be gradual, from easy to difficult. Too difficult easily results in self-confidence frustration and loss of interest in reading for students. The method is too simple, and low-level repetition is not beneficial to continuously improving the reading capability, and academic requirements of reading complex texts and developing related researches after the texts are promoted to universities cannot be met. In general, only the difficulty fits is best. With the development of China, the role played by China on the international stage is more and more important, so that more people have the requirement of learning Chinese. The learning of the Chinese text is one of the most effective ways, but the learning of the Chinese text with certain difficulty requires the Chinese learner to have certain literacy, if the Chinese learner does not meet the Chinese success base requirement of the corresponding text, the learning is done with half effort, and the interest and the hobbies of the Chinese learner can be greatly attacked. When the writing ability of the Chinese learner is developed, various texts should be provided for the reference in a targeted manner, and judgment and scoring can be performed based on the written text texts of the Chinese learner. Therefore, the classification of Chinese texts is a key technology for assisting the Chinese learning system.
The difficulty degree of the external Chinese grading reading materials refers to whether the reading materials at the level are suitable for being read by Chinese learners with Chinese language reaching the level, and whether the reading materials are too difficult or too easy can occur.
The text classification is that a computer is used for carrying out automatic classification marking on a text set according to a certain classification system or standard, and the text set is divided into two categories according to whether a deep learning technology is used or not, wherein the first category is based on the traditional machine learning text classification, and the second category is based on the deep learning text classification. Of course, there are cases where text classification techniques in the second category may be combined with conventional machine learning methods using deep learning.
In the later 90 s, the traditional machine learning is developed rapidly, and a set of inherent mode, feature engineering and classifier model is formed for text classification problems. The characteristic engineering is to refine the information in the text, so that a computer can easily identify and read the information in the text, and generally the characteristic engineering is divided into three steps, namely text preprocessing of the first step, feature extraction of the second step and text representation of the third step. The classifier models are known to compare the naive Bayes classification algorithm, KNN, SVM, maximum entropy, and so on.
In the NLP method based on deep neural network, the words/phrases in the text are usually represented by one-dimensional vectors (generally called "word vectors"); on the basis, the neural network takes the one-dimensional word vector of each character or word in the text as input, and outputs a one-dimensional word vector as semantic representation of the text after a series of complex conversions. In particular, it is generally desirable that the distance between words/phrases with similar semantics in the feature vector space is relatively close, so that the text vector converted from the word/phrase vector can also contain more accurate semantic information. Therefore, the main input of the BERT model is the original Word Vector of each character/Word in the text, and the Vector can be initialized randomly, and can also be pre-trained by using the algorithms such as Word2Vector and the like to be used as an initial value; the output is the vector representation of each character/word in the text after full-text semantic information is fused.
At present, chinese text classification is mostly used for classifying simple and short text sets such as microblog and news, and the effect of the existing method for Chinese text classification for Chinese learners is not ideal.
Disclosure of Invention
Aiming at least one of the defects or the improvement requirements in the prior art, particularly the complexity of the text classification problem of the Chinese learner, the classification standard changes correspondingly when different requirements of the Chinese learner are met, and the invention provides an external Chinese difficulty assessment method based on the fusion of the Bert model, svm and decision tree characteristics. According to the length, the readability and other characteristics of the article, 86 statistical characteristics are generated, and svm is used for classification to obtain the confidence coefficient 1. And classifying the encoding characteristics by using svm to obtain a confidence coefficient 2. And fusing the obtained 2 confidence degrees to be used as a new feature, and classifying by using a decision tree.
To achieve the above object, according to an aspect of the present invention, there is provided a method for evaluating difficulty of chinese language foreign language by a decision tree classification based on random subspace feature selection of svm and bert models, comprising the steps of:
s1, preprocessing a foreign-Chinese language chapter;
s2, generating a plurality of characteristics for the foreign Chinese article preprocessed in the step S1 according to the length of the foreign Chinese article, the readability of the article and the word generation amount of the article;
s3, classifying the articles containing all the features by using svm combination based on a random subspace to obtain a confidence coefficient 1;
s4, for the foreign Chinese article preprocessed in the step S1, performing average- > max boosting processing on an encoding-1-layer output information result extracted through a BERT model to obtain a multidimensional encoding characteristic of the article;
s5, classifying the encoding characteristics by using svm based on a random subspace to obtain a confidence coefficient 2;
and S6, fusing the obtained 2 confidence coefficients to be used as new features, and classifying by using a decision tree.
Preferably, in step S1, the preprocessing of the foreign-chinese language chapter includes saving in txt format.
Preferably, in step S1, preprocessing the foreign-chinese language chapter includes deleting an empty line in the article.
Preferably, in step S1, preprocessing the foreign-chinese language chapter includes sentence segmentation of the article.
Preferably, in step S1, the sentence division is to cut each article in sentence units by python, store in a list structure, and remove punctuation marks.
Preferably, the plurality of features generated in step S2 includes total number of words, total number of strokes, number of paragraphs, total number of sentences, and number of generated words.
Preferably, in step S6, the confidence 1 and the confidence 2 are used as a weighted average to be used as the integrated output of the article. The above-described preferred features may be combined with each other as long as they do not conflict with each other.
Generally, compared with the prior art, the technical scheme conceived by the invention has the following beneficial effects: the method for evaluating the difficulty of the Chinese to the exterior of the decision tree classification based on the random subspace feature selection of the svm and the Bert models obtains the representation of the Chinese article to the exterior containing rich semantic information by utilizing the stronger text feature extraction capability of the Bert model, and can fully utilize various features of the article by combining the statistical features of the words of the traditional article. The invention avoids the problems of low efficiency and under-fitting of the traditional algorithm, and uses all information most reasonably, so that the effect of increasing the classification basis is obvious. The method achieves 85.6% of accuracy in the evaluation of the difficulty of Chinese language.
Drawings
FIG. 1 is a general schematic diagram of a foreign Chinese difficulty assessment method for decision tree classification based on stochastic subspace feature selection of svm and bert models in accordance with the present invention;
FIG. 2 is a structural diagram of the encoding feature of the article extracted based on the Bert model used in the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other. The present invention will be described in further detail with reference to specific embodiments.
As shown in fig. 1, the present invention provides a method for evaluating difficulty of chinese language for external classification of decision tree based on random subspace feature selection of svm and bert models, comprising the following steps:
s1, preprocessing the foreign-Chinese language seal, including storing the foreign-Chinese language seal in a txt format, deleting empty lines in an article and dividing the article. The sentence dividing is to cut each article by taking a sentence as a unit by utilizing python, store the article in a list structure and remove punctuation marks;
s2, generating a plurality of characteristics, such as 86, including total word number, total stroke number, paragraph number, total sentence number and word number, of the foreign Chinese article preprocessed in the step S1 according to the length of the foreign Chinese article, the readability of the article and the word number of the article;
s3, classifying the articles containing all the features by using svm combination based on a random subspace to obtain a confidence coefficient 1;
s4, extracting an encoding-1 layer output information result of the Chinese article preprocessed in the step S1 through a BERT model, and then performing average- > max posing processing to obtain the multidimensional encoding characteristic of the article, wherein the multidimensional encoding characteristic is shown in figure 2;
s5, classifying the encoding characteristics by using svm based on a random subspace to obtain a confidence coefficient 2;
and S6, fusing the obtained 2 confidence coefficients to be used as new features, and classifying by using a decision tree. Preferably, in step S6, the confidence 1 and confidence 2 are used as a weighted average to be used as the integrated output of the article. The above-described preferred features may be combined with each other as long as they do not conflict with each other.
The detailed example is used for explaining, and the invention provides a method for evaluating the difficulty of Chinese foreign language by classifying a decision tree based on random subspace feature selection of svm and bert models, which comprises the following steps:
(1) Crawling the composition on the composition network according to grade (from first grade of primary school to third grade of high school) by using a crawler technology, correctly dividing a data set by taking the grade as a standard, writing grade information in front of a file name, and storing the grade information in a txt format.
(2) For each grade of article, one most representative article is selected to be taken out as a benchmark article and taken out as a standard representative of each type.
(3) Each article is cut in sentence units using python, stored in a list structure, and punctuation needs to be removed.
(4) For the preprocessed foreign Chinese articles, generating a plurality of characteristics, such as 86, including total word number, total stroke number, paragraph number, total sentence number and word generation number, according to the length of the foreign Chinese articles, the readability of the articles and the word generation number of the articles; the invention examines the difficulty of the Chinese classified reading from three angles, namely the length of the reading, namely the number of Chinese characters contained in the reading, the readability of the reading, namely the average sentence length and the average number of each hundred of characters of the reading, and the quantity of the new words of the reading, namely the quantity of the new words appearing in the reading.
(5) Then, articles containing all the above features are classified using a random subspace-based svm combination, resulting in a confidence 1.
(6) For the preprocessed foreign Chinese article, the-1 layer output information result of encoding is extracted through a BERT model, and then the operation- > max firing processing is carried out to obtain the multidimensional encoding characteristic of the article, as shown in figure 2, for the input of each sentence, a Bert structure is encoded, so that a label attention weighting mechanism and a word weight value are changed, and a multi-core enables a label embedded boundary to be more detailed and can better fit data.
(7) And classifying the encoding characteristics by using svm based on a random subspace to obtain a confidence coefficient 2.
(8) And fusing the obtained 2 confidence coefficients to be used as new characteristics, and classifying by using a decision tree. When training, each article is cut into a combination of a plurality of sentences, so that the sentences are the input basic units, and after each sentence of one article is classified, a weighted average is required to be used as the comprehensive output of the article.
< description of the experiment and results >
In the example, 51356 composition texts are crawled from 13 composition texts, the composition texts are classified according to 12 grades from the school to the high school, 4000 and 48000 composition texts of each composition text are respectively screened out, the composition texts are stored in a txt format, the proportion of a training set to a test set and a verification set is 7, the training set is divided by 1, the training set is used for implementing training according to a specific implementation method, and meanwhile, the accuracy of the verification set is observed to select a time point for terminating the training.
When a model with a fixed core is trained every time, all samples are disturbed, the training set, the testing set and the verification set are sequentially re-taken, training and verification are carried out again, 10 cycles of operation are carried out totally, and the result in the following table is the average value of 10 experimental results.
The specific experimental results are shown in table 1.
TABLE 1 Experimental results
Model (model) Nucleus of svm Average value of F1-score
SVM+Bert+DT(Decision Tree) Linear kernel function 82.32%
SVM+Bert+DT Polynomial kernel function 82.47%
SVM+Bert+DT RBF kernel function (Gaussian kernel function) 85.6%
In summary, aiming at the text classification problem of the external Chinese article difficulty assessment, the invention provides an external Chinese difficulty assessment and automatic classification method based on decision tree classification of random subspace feature selection of svm and Bert models. The method avoids the problems of low efficiency and under-fitting of the traditional algorithm, and uses all information most reasonably, so that the effect of increasing the classification basis is obvious. The method achieves 85.6% of accuracy rate on the evaluation of the difficulty of Chinese language.
It will be understood by those skilled in the art that the foregoing is only an exemplary embodiment of the present invention, and is not intended to limit the invention to the particular forms disclosed, since various modifications, substitutions and improvements within the spirit and scope of the invention are possible and within the scope of the appended claims.

Claims (7)

1. A foreign Chinese difficulty assessment method of decision tree classification based on random subspace feature selection of svm and bert models is characterized by comprising the following steps:
s1, preprocessing a foreign-Chinese language chapter;
s2, generating a plurality of characteristics of the foreign Chinese article preprocessed in the step S1 according to the length of the foreign Chinese article, the readability of the article and the word quantity of the article;
s3, classifying the articles containing all the characteristics by using svm combination based on a random subspace to obtain a confidence 1;
s4, extracting an encoding-1 layer output information result of the Chinese article preprocessed in the step S1 through a BERT model, and then performing average- > max firing processing to obtain a multi-dimensional encoding characteristic of the article;
s5, classifying the encoding characteristics by using svm based on a random subspace to obtain a confidence coefficient 2;
and S6, fusing the obtained 2 confidence coefficients to serve as new features, and classifying by using a decision tree.
2. The method for assessing difficulty of chinese language outbound through decision tree classification based on stochastic subspace feature selection of svm and bert models as claimed in claim 1, wherein:
in step S1, preprocessing the foreign Chinese language chapter includes storing the foreign Chinese language chapter in a txt format.
3. The method for assessing difficulty of chinese language outbound of decision tree classification based on stochastic subspace feature selection of svm and bert models of claim 2, wherein:
in step S1, preprocessing the foreign-chinese language chapter includes deleting empty lines in the article.
4. The method for assessing difficulty of chinese language outbound of decision tree classification based on stochastic subspace feature selection of svm and bert models of claim 3, wherein:
in step S1, preprocessing the foreign-chinese language seals includes sentence-splitting the articles.
5. The method for assessing difficulty of chinese language outbound through decision tree classification based on stochastic subspace feature selection of svm and bert models as claimed in claim 4, wherein:
in step S1, the sentence segmentation is to cut each article by using python in sentence units, store the article in a list structure, and remove punctuation marks.
6. The method for foreign chinese difficulty assessment by decision tree classification based on stochastic subspace feature selection of the svm and bert models according to claim 1, wherein:
the plurality of features generated in step S2 include the total number of words, the total number of strokes, the number of paragraphs, the number of sentences, and the number of generated words.
7. The method for foreign chinese difficulty assessment by decision tree classification based on stochastic subspace feature selection of the svm and bert models according to claim 1, wherein:
in step S6, the weighted average of confidence 1 and confidence 2 is used as the overall output of the article.
CN201911206414.9A 2019-11-29 2019-11-29 Foreign Chinese difficulty assessment method based on decision tree classification of random subspace Active CN111078874B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911206414.9A CN111078874B (en) 2019-11-29 2019-11-29 Foreign Chinese difficulty assessment method based on decision tree classification of random subspace

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911206414.9A CN111078874B (en) 2019-11-29 2019-11-29 Foreign Chinese difficulty assessment method based on decision tree classification of random subspace

Publications (2)

Publication Number Publication Date
CN111078874A CN111078874A (en) 2020-04-28
CN111078874B true CN111078874B (en) 2023-04-07

Family

ID=70312204

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911206414.9A Active CN111078874B (en) 2019-11-29 2019-11-29 Foreign Chinese difficulty assessment method based on decision tree classification of random subspace

Country Status (1)

Country Link
CN (1) CN111078874B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111797229A (en) * 2020-06-10 2020-10-20 南京擎盾信息科技有限公司 Text representation method and device and text classification method
CN112631139B (en) * 2020-12-14 2022-04-22 山东大学 Intelligent household instruction reasonability real-time detection system and method

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW200521895A (en) * 2003-12-26 2005-07-01 Inventec Besta Co Ltd System and method to recognize the degree of mastering difficulty for a language text
CN101814066A (en) * 2009-02-23 2010-08-25 富士通株式会社 Text reading difficulty judging device and method thereof
CN103207854A (en) * 2012-01-11 2013-07-17 宋曜廷 Chinese text readability measuring system and method thereof
CN105068993A (en) * 2015-07-31 2015-11-18 成都思戴科科技有限公司 Method for evaluating text difficulty
CN105468713A (en) * 2015-11-19 2016-04-06 西安交通大学 Multi-model fused short text classification method
CN107145514A (en) * 2017-04-01 2017-09-08 华南理工大学 Chinese sentence pattern sorting technique based on decision tree and SVM mixed models
CN107506346A (en) * 2017-07-10 2017-12-22 北京享阅教育科技有限公司 A kind of Chinese reading grade of difficulty method and system based on machine learning
CN107977362A (en) * 2017-12-11 2018-05-01 中山大学 A kind of method defined the level for Chinese text and calculate the scoring of Chinese text difficulty
CN108984531A (en) * 2018-07-23 2018-12-11 深圳市悦好教育科技有限公司 Books reading difficulty method and system based on language teaching material
CN109977408A (en) * 2019-03-27 2019-07-05 西安电子科技大学 The implementation method of English Reading classification and reading matter recommender system based on deep learning

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007249755A (en) * 2006-03-17 2007-09-27 Ibm Japan Ltd System for evaluating difficulty understanding document and method therefor
US11017180B2 (en) * 2018-04-18 2021-05-25 HelpShift, Inc. System and methods for processing and interpreting text messages

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW200521895A (en) * 2003-12-26 2005-07-01 Inventec Besta Co Ltd System and method to recognize the degree of mastering difficulty for a language text
CN101814066A (en) * 2009-02-23 2010-08-25 富士通株式会社 Text reading difficulty judging device and method thereof
CN103207854A (en) * 2012-01-11 2013-07-17 宋曜廷 Chinese text readability measuring system and method thereof
CN105068993A (en) * 2015-07-31 2015-11-18 成都思戴科科技有限公司 Method for evaluating text difficulty
CN105468713A (en) * 2015-11-19 2016-04-06 西安交通大学 Multi-model fused short text classification method
CN107145514A (en) * 2017-04-01 2017-09-08 华南理工大学 Chinese sentence pattern sorting technique based on decision tree and SVM mixed models
CN107506346A (en) * 2017-07-10 2017-12-22 北京享阅教育科技有限公司 A kind of Chinese reading grade of difficulty method and system based on machine learning
CN107977362A (en) * 2017-12-11 2018-05-01 中山大学 A kind of method defined the level for Chinese text and calculate the scoring of Chinese text difficulty
CN108984531A (en) * 2018-07-23 2018-12-11 深圳市悦好教育科技有限公司 Books reading difficulty method and system based on language teaching material
CN109977408A (en) * 2019-03-27 2019-07-05 西安电子科技大学 The implementation method of English Reading classification and reading matter recommender system based on deep learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于回归模型的对外汉语阅读材料的可读性自动评估研究;曾致中;《中国教育信息化》;全文 *
基于随机森林算法的对外汉语文本可读性评估;杨文媞;《中国教育信息化》;全文 *

Also Published As

Publication number Publication date
CN111078874A (en) 2020-04-28

Similar Documents

Publication Publication Date Title
CN109977416B (en) Multi-level natural language anti-spam text method and system
CN110245229B (en) Deep learning theme emotion classification method based on data enhancement
CN109800310B (en) Electric power operation and maintenance text analysis method based on structured expression
CN108255813B (en) Text matching method based on word frequency-inverse document and CRF
CN111160031A (en) Social media named entity identification method based on affix perception
CN111966917A (en) Event detection and summarization method based on pre-training language model
CN105205124B (en) A kind of semi-supervised text sentiment classification method based on random character subspace
CN108446271A (en) The text emotion analysis method of convolutional neural networks based on Hanzi component feature
CN112395421B (en) Course label generation method and device, computer equipment and medium
CN109101490B (en) Factual implicit emotion recognition method and system based on fusion feature representation
CN111339260A (en) BERT and QA thought-based fine-grained emotion analysis method
CN108108468A (en) A kind of short text sentiment analysis method and apparatus based on concept and text emotion
CN112287100A (en) Text recognition method, spelling error correction method and voice recognition method
CN111078874B (en) Foreign Chinese difficulty assessment method based on decision tree classification of random subspace
CN112905736A (en) Unsupervised text emotion analysis method based on quantum theory
CN112069312A (en) Text classification method based on entity recognition and electronic device
CN113112239A (en) Portable post talent screening method
Fauziah et al. Lexicon Based Sentiment Analysis in Indonesia Languages: A Systematic Literature Review
CN113220964B (en) Viewpoint mining method based on short text in network message field
CN110874408B (en) Model training method, text recognition device and computing equipment
CN113569008A (en) Big data analysis method and system based on community management data
CN113869054A (en) Deep learning-based electric power field project feature identification method
CN113486143A (en) User portrait generation method based on multi-level text representation and model fusion
CN110705306B (en) Evaluation method for consistency of written and written texts
CN116484123A (en) Label recommendation model construction method and label recommendation method for long text

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant