CN111078874B - Foreign Chinese difficulty assessment method based on decision tree classification of random subspace - Google Patents
Foreign Chinese difficulty assessment method based on decision tree classification of random subspace Download PDFInfo
- Publication number
- CN111078874B CN111078874B CN201911206414.9A CN201911206414A CN111078874B CN 111078874 B CN111078874 B CN 111078874B CN 201911206414 A CN201911206414 A CN 201911206414A CN 111078874 B CN111078874 B CN 111078874B
- Authority
- CN
- China
- Prior art keywords
- chinese
- article
- svm
- decision tree
- foreign
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/24323—Tree-organised classifiers
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Databases & Information Systems (AREA)
- Computational Linguistics (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a method for evaluating the difficulty of Chinese to the exterior of decision tree classification based on random subspace feature selection of svm and bert models, which comprises the steps of generating 86 statistical features according to the characteristics of the length, the readability and the like of an article, and classifying by using svm to obtain a confidence coefficient 1. And classifying the encoding characteristics by using svm to obtain a confidence coefficient 2. And fusing the obtained 2 confidence coefficients to be used as new features, and classifying by using a decision tree. For encoding feature data: the result of encoding-1 layer output information is extracted through a BERT model, and then average- > max posing processing is carried out to obtain total 768-dimensional characteristics without normalization. The method avoids the problems of low efficiency and under-fitting of the traditional algorithm, and uses all information most reasonably, so that the effect of increasing the classification basis is obvious. The method achieves 85.6% of accuracy rate on the evaluation of the difficulty of Chinese language.
Description
Technical Field
The invention belongs to the field of education informatization, and particularly relates to a decision tree classification external Chinese difficulty assessment method based on random subspace feature selection of svm and bert models.
Background
It is well known that reading should be gradual, from easy to difficult. Too difficult easily results in self-confidence frustration and loss of interest in reading for students. The method is too simple, and low-level repetition is not beneficial to continuously improving the reading capability, and academic requirements of reading complex texts and developing related researches after the texts are promoted to universities cannot be met. In general, only the difficulty fits is best. With the development of China, the role played by China on the international stage is more and more important, so that more people have the requirement of learning Chinese. The learning of the Chinese text is one of the most effective ways, but the learning of the Chinese text with certain difficulty requires the Chinese learner to have certain literacy, if the Chinese learner does not meet the Chinese success base requirement of the corresponding text, the learning is done with half effort, and the interest and the hobbies of the Chinese learner can be greatly attacked. When the writing ability of the Chinese learner is developed, various texts should be provided for the reference in a targeted manner, and judgment and scoring can be performed based on the written text texts of the Chinese learner. Therefore, the classification of Chinese texts is a key technology for assisting the Chinese learning system.
The difficulty degree of the external Chinese grading reading materials refers to whether the reading materials at the level are suitable for being read by Chinese learners with Chinese language reaching the level, and whether the reading materials are too difficult or too easy can occur.
The text classification is that a computer is used for carrying out automatic classification marking on a text set according to a certain classification system or standard, and the text set is divided into two categories according to whether a deep learning technology is used or not, wherein the first category is based on the traditional machine learning text classification, and the second category is based on the deep learning text classification. Of course, there are cases where text classification techniques in the second category may be combined with conventional machine learning methods using deep learning.
In the later 90 s, the traditional machine learning is developed rapidly, and a set of inherent mode, feature engineering and classifier model is formed for text classification problems. The characteristic engineering is to refine the information in the text, so that a computer can easily identify and read the information in the text, and generally the characteristic engineering is divided into three steps, namely text preprocessing of the first step, feature extraction of the second step and text representation of the third step. The classifier models are known to compare the naive Bayes classification algorithm, KNN, SVM, maximum entropy, and so on.
In the NLP method based on deep neural network, the words/phrases in the text are usually represented by one-dimensional vectors (generally called "word vectors"); on the basis, the neural network takes the one-dimensional word vector of each character or word in the text as input, and outputs a one-dimensional word vector as semantic representation of the text after a series of complex conversions. In particular, it is generally desirable that the distance between words/phrases with similar semantics in the feature vector space is relatively close, so that the text vector converted from the word/phrase vector can also contain more accurate semantic information. Therefore, the main input of the BERT model is the original Word Vector of each character/Word in the text, and the Vector can be initialized randomly, and can also be pre-trained by using the algorithms such as Word2Vector and the like to be used as an initial value; the output is the vector representation of each character/word in the text after full-text semantic information is fused.
At present, chinese text classification is mostly used for classifying simple and short text sets such as microblog and news, and the effect of the existing method for Chinese text classification for Chinese learners is not ideal.
Disclosure of Invention
Aiming at least one of the defects or the improvement requirements in the prior art, particularly the complexity of the text classification problem of the Chinese learner, the classification standard changes correspondingly when different requirements of the Chinese learner are met, and the invention provides an external Chinese difficulty assessment method based on the fusion of the Bert model, svm and decision tree characteristics. According to the length, the readability and other characteristics of the article, 86 statistical characteristics are generated, and svm is used for classification to obtain the confidence coefficient 1. And classifying the encoding characteristics by using svm to obtain a confidence coefficient 2. And fusing the obtained 2 confidence degrees to be used as a new feature, and classifying by using a decision tree.
To achieve the above object, according to an aspect of the present invention, there is provided a method for evaluating difficulty of chinese language foreign language by a decision tree classification based on random subspace feature selection of svm and bert models, comprising the steps of:
s1, preprocessing a foreign-Chinese language chapter;
s2, generating a plurality of characteristics for the foreign Chinese article preprocessed in the step S1 according to the length of the foreign Chinese article, the readability of the article and the word generation amount of the article;
s3, classifying the articles containing all the features by using svm combination based on a random subspace to obtain a confidence coefficient 1;
s4, for the foreign Chinese article preprocessed in the step S1, performing average- > max boosting processing on an encoding-1-layer output information result extracted through a BERT model to obtain a multidimensional encoding characteristic of the article;
s5, classifying the encoding characteristics by using svm based on a random subspace to obtain a confidence coefficient 2;
and S6, fusing the obtained 2 confidence coefficients to be used as new features, and classifying by using a decision tree.
Preferably, in step S1, the preprocessing of the foreign-chinese language chapter includes saving in txt format.
Preferably, in step S1, preprocessing the foreign-chinese language chapter includes deleting an empty line in the article.
Preferably, in step S1, preprocessing the foreign-chinese language chapter includes sentence segmentation of the article.
Preferably, in step S1, the sentence division is to cut each article in sentence units by python, store in a list structure, and remove punctuation marks.
Preferably, the plurality of features generated in step S2 includes total number of words, total number of strokes, number of paragraphs, total number of sentences, and number of generated words.
Preferably, in step S6, the confidence 1 and the confidence 2 are used as a weighted average to be used as the integrated output of the article. The above-described preferred features may be combined with each other as long as they do not conflict with each other.
Generally, compared with the prior art, the technical scheme conceived by the invention has the following beneficial effects: the method for evaluating the difficulty of the Chinese to the exterior of the decision tree classification based on the random subspace feature selection of the svm and the Bert models obtains the representation of the Chinese article to the exterior containing rich semantic information by utilizing the stronger text feature extraction capability of the Bert model, and can fully utilize various features of the article by combining the statistical features of the words of the traditional article. The invention avoids the problems of low efficiency and under-fitting of the traditional algorithm, and uses all information most reasonably, so that the effect of increasing the classification basis is obvious. The method achieves 85.6% of accuracy in the evaluation of the difficulty of Chinese language.
Drawings
FIG. 1 is a general schematic diagram of a foreign Chinese difficulty assessment method for decision tree classification based on stochastic subspace feature selection of svm and bert models in accordance with the present invention;
FIG. 2 is a structural diagram of the encoding feature of the article extracted based on the Bert model used in the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other. The present invention will be described in further detail with reference to specific embodiments.
As shown in fig. 1, the present invention provides a method for evaluating difficulty of chinese language for external classification of decision tree based on random subspace feature selection of svm and bert models, comprising the following steps:
s1, preprocessing the foreign-Chinese language seal, including storing the foreign-Chinese language seal in a txt format, deleting empty lines in an article and dividing the article. The sentence dividing is to cut each article by taking a sentence as a unit by utilizing python, store the article in a list structure and remove punctuation marks;
s2, generating a plurality of characteristics, such as 86, including total word number, total stroke number, paragraph number, total sentence number and word number, of the foreign Chinese article preprocessed in the step S1 according to the length of the foreign Chinese article, the readability of the article and the word number of the article;
s3, classifying the articles containing all the features by using svm combination based on a random subspace to obtain a confidence coefficient 1;
s4, extracting an encoding-1 layer output information result of the Chinese article preprocessed in the step S1 through a BERT model, and then performing average- > max posing processing to obtain the multidimensional encoding characteristic of the article, wherein the multidimensional encoding characteristic is shown in figure 2;
s5, classifying the encoding characteristics by using svm based on a random subspace to obtain a confidence coefficient 2;
and S6, fusing the obtained 2 confidence coefficients to be used as new features, and classifying by using a decision tree. Preferably, in step S6, the confidence 1 and confidence 2 are used as a weighted average to be used as the integrated output of the article. The above-described preferred features may be combined with each other as long as they do not conflict with each other.
The detailed example is used for explaining, and the invention provides a method for evaluating the difficulty of Chinese foreign language by classifying a decision tree based on random subspace feature selection of svm and bert models, which comprises the following steps:
(1) Crawling the composition on the composition network according to grade (from first grade of primary school to third grade of high school) by using a crawler technology, correctly dividing a data set by taking the grade as a standard, writing grade information in front of a file name, and storing the grade information in a txt format.
(2) For each grade of article, one most representative article is selected to be taken out as a benchmark article and taken out as a standard representative of each type.
(3) Each article is cut in sentence units using python, stored in a list structure, and punctuation needs to be removed.
(4) For the preprocessed foreign Chinese articles, generating a plurality of characteristics, such as 86, including total word number, total stroke number, paragraph number, total sentence number and word generation number, according to the length of the foreign Chinese articles, the readability of the articles and the word generation number of the articles; the invention examines the difficulty of the Chinese classified reading from three angles, namely the length of the reading, namely the number of Chinese characters contained in the reading, the readability of the reading, namely the average sentence length and the average number of each hundred of characters of the reading, and the quantity of the new words of the reading, namely the quantity of the new words appearing in the reading.
(5) Then, articles containing all the above features are classified using a random subspace-based svm combination, resulting in a confidence 1.
(6) For the preprocessed foreign Chinese article, the-1 layer output information result of encoding is extracted through a BERT model, and then the operation- > max firing processing is carried out to obtain the multidimensional encoding characteristic of the article, as shown in figure 2, for the input of each sentence, a Bert structure is encoded, so that a label attention weighting mechanism and a word weight value are changed, and a multi-core enables a label embedded boundary to be more detailed and can better fit data.
(7) And classifying the encoding characteristics by using svm based on a random subspace to obtain a confidence coefficient 2.
(8) And fusing the obtained 2 confidence coefficients to be used as new characteristics, and classifying by using a decision tree. When training, each article is cut into a combination of a plurality of sentences, so that the sentences are the input basic units, and after each sentence of one article is classified, a weighted average is required to be used as the comprehensive output of the article.
< description of the experiment and results >
In the example, 51356 composition texts are crawled from 13 composition texts, the composition texts are classified according to 12 grades from the school to the high school, 4000 and 48000 composition texts of each composition text are respectively screened out, the composition texts are stored in a txt format, the proportion of a training set to a test set and a verification set is 7, the training set is divided by 1, the training set is used for implementing training according to a specific implementation method, and meanwhile, the accuracy of the verification set is observed to select a time point for terminating the training.
When a model with a fixed core is trained every time, all samples are disturbed, the training set, the testing set and the verification set are sequentially re-taken, training and verification are carried out again, 10 cycles of operation are carried out totally, and the result in the following table is the average value of 10 experimental results.
The specific experimental results are shown in table 1.
TABLE 1 Experimental results
Model (model) | Nucleus of svm | Average value of F1-score |
SVM+Bert+DT(Decision Tree) | Linear kernel function | 82.32% |
SVM+Bert+DT | Polynomial kernel function | 82.47% |
SVM+Bert+DT | RBF kernel function (Gaussian kernel function) | 85.6% |
In summary, aiming at the text classification problem of the external Chinese article difficulty assessment, the invention provides an external Chinese difficulty assessment and automatic classification method based on decision tree classification of random subspace feature selection of svm and Bert models. The method avoids the problems of low efficiency and under-fitting of the traditional algorithm, and uses all information most reasonably, so that the effect of increasing the classification basis is obvious. The method achieves 85.6% of accuracy rate on the evaluation of the difficulty of Chinese language.
It will be understood by those skilled in the art that the foregoing is only an exemplary embodiment of the present invention, and is not intended to limit the invention to the particular forms disclosed, since various modifications, substitutions and improvements within the spirit and scope of the invention are possible and within the scope of the appended claims.
Claims (7)
1. A foreign Chinese difficulty assessment method of decision tree classification based on random subspace feature selection of svm and bert models is characterized by comprising the following steps:
s1, preprocessing a foreign-Chinese language chapter;
s2, generating a plurality of characteristics of the foreign Chinese article preprocessed in the step S1 according to the length of the foreign Chinese article, the readability of the article and the word quantity of the article;
s3, classifying the articles containing all the characteristics by using svm combination based on a random subspace to obtain a confidence 1;
s4, extracting an encoding-1 layer output information result of the Chinese article preprocessed in the step S1 through a BERT model, and then performing average- > max firing processing to obtain a multi-dimensional encoding characteristic of the article;
s5, classifying the encoding characteristics by using svm based on a random subspace to obtain a confidence coefficient 2;
and S6, fusing the obtained 2 confidence coefficients to serve as new features, and classifying by using a decision tree.
2. The method for assessing difficulty of chinese language outbound through decision tree classification based on stochastic subspace feature selection of svm and bert models as claimed in claim 1, wherein:
in step S1, preprocessing the foreign Chinese language chapter includes storing the foreign Chinese language chapter in a txt format.
3. The method for assessing difficulty of chinese language outbound of decision tree classification based on stochastic subspace feature selection of svm and bert models of claim 2, wherein:
in step S1, preprocessing the foreign-chinese language chapter includes deleting empty lines in the article.
4. The method for assessing difficulty of chinese language outbound of decision tree classification based on stochastic subspace feature selection of svm and bert models of claim 3, wherein:
in step S1, preprocessing the foreign-chinese language seals includes sentence-splitting the articles.
5. The method for assessing difficulty of chinese language outbound through decision tree classification based on stochastic subspace feature selection of svm and bert models as claimed in claim 4, wherein:
in step S1, the sentence segmentation is to cut each article by using python in sentence units, store the article in a list structure, and remove punctuation marks.
6. The method for foreign chinese difficulty assessment by decision tree classification based on stochastic subspace feature selection of the svm and bert models according to claim 1, wherein:
the plurality of features generated in step S2 include the total number of words, the total number of strokes, the number of paragraphs, the number of sentences, and the number of generated words.
7. The method for foreign chinese difficulty assessment by decision tree classification based on stochastic subspace feature selection of the svm and bert models according to claim 1, wherein:
in step S6, the weighted average of confidence 1 and confidence 2 is used as the overall output of the article.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911206414.9A CN111078874B (en) | 2019-11-29 | 2019-11-29 | Foreign Chinese difficulty assessment method based on decision tree classification of random subspace |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911206414.9A CN111078874B (en) | 2019-11-29 | 2019-11-29 | Foreign Chinese difficulty assessment method based on decision tree classification of random subspace |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111078874A CN111078874A (en) | 2020-04-28 |
CN111078874B true CN111078874B (en) | 2023-04-07 |
Family
ID=70312204
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911206414.9A Active CN111078874B (en) | 2019-11-29 | 2019-11-29 | Foreign Chinese difficulty assessment method based on decision tree classification of random subspace |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111078874B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111797229A (en) * | 2020-06-10 | 2020-10-20 | 南京擎盾信息科技有限公司 | Text representation method and device and text classification method |
CN112631139B (en) * | 2020-12-14 | 2022-04-22 | 山东大学 | Intelligent household instruction reasonability real-time detection system and method |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TW200521895A (en) * | 2003-12-26 | 2005-07-01 | Inventec Besta Co Ltd | System and method to recognize the degree of mastering difficulty for a language text |
CN101814066A (en) * | 2009-02-23 | 2010-08-25 | 富士通株式会社 | Text reading difficulty judging device and method thereof |
CN103207854A (en) * | 2012-01-11 | 2013-07-17 | 宋曜廷 | Chinese text readability measuring system and method thereof |
CN105068993A (en) * | 2015-07-31 | 2015-11-18 | 成都思戴科科技有限公司 | Method for evaluating text difficulty |
CN105468713A (en) * | 2015-11-19 | 2016-04-06 | 西安交通大学 | Multi-model fused short text classification method |
CN107145514A (en) * | 2017-04-01 | 2017-09-08 | 华南理工大学 | Chinese sentence pattern sorting technique based on decision tree and SVM mixed models |
CN107506346A (en) * | 2017-07-10 | 2017-12-22 | 北京享阅教育科技有限公司 | A kind of Chinese reading grade of difficulty method and system based on machine learning |
CN107977362A (en) * | 2017-12-11 | 2018-05-01 | 中山大学 | A kind of method defined the level for Chinese text and calculate the scoring of Chinese text difficulty |
CN108984531A (en) * | 2018-07-23 | 2018-12-11 | 深圳市悦好教育科技有限公司 | Books reading difficulty method and system based on language teaching material |
CN109977408A (en) * | 2019-03-27 | 2019-07-05 | 西安电子科技大学 | The implementation method of English Reading classification and reading matter recommender system based on deep learning |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2007249755A (en) * | 2006-03-17 | 2007-09-27 | Ibm Japan Ltd | System for evaluating difficulty understanding document and method therefor |
US11017180B2 (en) * | 2018-04-18 | 2021-05-25 | HelpShift, Inc. | System and methods for processing and interpreting text messages |
-
2019
- 2019-11-29 CN CN201911206414.9A patent/CN111078874B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TW200521895A (en) * | 2003-12-26 | 2005-07-01 | Inventec Besta Co Ltd | System and method to recognize the degree of mastering difficulty for a language text |
CN101814066A (en) * | 2009-02-23 | 2010-08-25 | 富士通株式会社 | Text reading difficulty judging device and method thereof |
CN103207854A (en) * | 2012-01-11 | 2013-07-17 | 宋曜廷 | Chinese text readability measuring system and method thereof |
CN105068993A (en) * | 2015-07-31 | 2015-11-18 | 成都思戴科科技有限公司 | Method for evaluating text difficulty |
CN105468713A (en) * | 2015-11-19 | 2016-04-06 | 西安交通大学 | Multi-model fused short text classification method |
CN107145514A (en) * | 2017-04-01 | 2017-09-08 | 华南理工大学 | Chinese sentence pattern sorting technique based on decision tree and SVM mixed models |
CN107506346A (en) * | 2017-07-10 | 2017-12-22 | 北京享阅教育科技有限公司 | A kind of Chinese reading grade of difficulty method and system based on machine learning |
CN107977362A (en) * | 2017-12-11 | 2018-05-01 | 中山大学 | A kind of method defined the level for Chinese text and calculate the scoring of Chinese text difficulty |
CN108984531A (en) * | 2018-07-23 | 2018-12-11 | 深圳市悦好教育科技有限公司 | Books reading difficulty method and system based on language teaching material |
CN109977408A (en) * | 2019-03-27 | 2019-07-05 | 西安电子科技大学 | The implementation method of English Reading classification and reading matter recommender system based on deep learning |
Non-Patent Citations (2)
Title |
---|
基于回归模型的对外汉语阅读材料的可读性自动评估研究;曾致中;《中国教育信息化》;全文 * |
基于随机森林算法的对外汉语文本可读性评估;杨文媞;《中国教育信息化》;全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN111078874A (en) | 2020-04-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109977416B (en) | Multi-level natural language anti-spam text method and system | |
CN110245229B (en) | Deep learning theme emotion classification method based on data enhancement | |
CN109800310B (en) | Electric power operation and maintenance text analysis method based on structured expression | |
CN108255813B (en) | Text matching method based on word frequency-inverse document and CRF | |
CN111160031A (en) | Social media named entity identification method based on affix perception | |
CN111966917A (en) | Event detection and summarization method based on pre-training language model | |
CN105205124B (en) | A kind of semi-supervised text sentiment classification method based on random character subspace | |
CN108446271A (en) | The text emotion analysis method of convolutional neural networks based on Hanzi component feature | |
CN112395421B (en) | Course label generation method and device, computer equipment and medium | |
CN109101490B (en) | Factual implicit emotion recognition method and system based on fusion feature representation | |
CN111339260A (en) | BERT and QA thought-based fine-grained emotion analysis method | |
CN108108468A (en) | A kind of short text sentiment analysis method and apparatus based on concept and text emotion | |
CN112287100A (en) | Text recognition method, spelling error correction method and voice recognition method | |
CN111078874B (en) | Foreign Chinese difficulty assessment method based on decision tree classification of random subspace | |
CN112905736A (en) | Unsupervised text emotion analysis method based on quantum theory | |
CN112069312A (en) | Text classification method based on entity recognition and electronic device | |
CN113112239A (en) | Portable post talent screening method | |
Fauziah et al. | Lexicon Based Sentiment Analysis in Indonesia Languages: A Systematic Literature Review | |
CN113220964B (en) | Viewpoint mining method based on short text in network message field | |
CN110874408B (en) | Model training method, text recognition device and computing equipment | |
CN113569008A (en) | Big data analysis method and system based on community management data | |
CN113869054A (en) | Deep learning-based electric power field project feature identification method | |
CN113486143A (en) | User portrait generation method based on multi-level text representation and model fusion | |
CN110705306B (en) | Evaluation method for consistency of written and written texts | |
CN116484123A (en) | Label recommendation model construction method and label recommendation method for long text |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |