CN110263166A - Public sentiment file classification method based on deep learning - Google Patents
Public sentiment file classification method based on deep learning Download PDFInfo
- Publication number
- CN110263166A CN110263166A CN201910525459.6A CN201910525459A CN110263166A CN 110263166 A CN110263166 A CN 110263166A CN 201910525459 A CN201910525459 A CN 201910525459A CN 110263166 A CN110263166 A CN 110263166A
- Authority
- CN
- China
- Prior art keywords
- sample
- data
- positive
- negative
- training
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 25
- 238000013135 deep learning Methods 0.000 title claims abstract description 8
- 238000012549 training Methods 0.000 claims abstract description 37
- 238000012360 testing method Methods 0.000 claims abstract description 11
- 238000005516 engineering process Methods 0.000 claims abstract description 7
- 238000013527 convolutional neural network Methods 0.000 claims description 13
- 230000008451 emotion Effects 0.000 claims description 5
- 238000004364 calculation method Methods 0.000 claims description 3
- 238000013145 classification model Methods 0.000 claims description 3
- 238000010801 machine learning Methods 0.000 claims description 3
- 238000004458 analytical method Methods 0.000 description 2
- 238000012790 confirmation Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000013461 design Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/355—Class or cluster creation or modification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Biomedical Technology (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biophysics (AREA)
- Software Systems (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention provides the public sentiment file classification methods based on deep learning, include the following steps: 1, crawl enterprise's public sentiment text from internet Baidu, available a small amount of positive sample and largely without mark sample;2, initial training data set is constructed by PU-Learning technology;3, three kinds of depth models are trained using fasttext, CNN, RNN to the data set in 2, using multi-model coorinated training;4, the trained CNN of data set after expanding in use 3 classifies to test data set.This patent constructs positive sample data by the purposive data that crawl, and the quality of positive sample can be made higher;More farther, the relatively reliable negative sample apart from positive sample can be obtained from without mark sample;The problem of business personnel's concern is identified from public sentiment data with higher accuracy rate and event, push and early warning, substantially increase business personnel's working efficiency in time.
Description
Technical field
The present invention relates to a kind of public sentiment file classification method, especially a kind of quality of positive sample is higher, can obtain distance
Farther, the relatively reliable negative sample of positive sample, accuracy rate is high, the public sentiment text classification side based on deep learning that work efficiency is high
Method.
Background technique
Currently, the classification for Company News public sentiment text data combines simple rule to be divided also in artificial treatment
The stage of class, inefficiency, while classifying quality not can guarantee.
Summary of the invention
To solve the above problems, the present invention provides a kind of quality of positive sample is higher, can obtain it is farther apart from positive sample,
Relatively reliable negative sample, accuracy rate is high, the public sentiment file classification method based on deep learning that work efficiency is high.
Public sentiment file classification method based on deep learning includes the following steps: 1, crawls enterprise carriage from internet Baidu
Feelings text, available a small amount of positive sample and largely without mark sample;2, pass through the initial instruction of PU-Learning technology building
Practice data set;3, three kinds of depth models are trained using fasttext, CNN, RNN to the data set in 2, is cooperateed with and is instructed using multi-model
Practice, classification judgement is carried out to without mark sample data with these three models respectively, if three kinds of classifiers are determined as positive sample
And emotion be it is negative, then be determined as positive sample, positive sample collection be added;If three kinds of classifiers are determined as negative sample and emotion is
Front is then determined as negative sample, and negative sample collection is added;Other situations wouldn't process;4, the data training after expanding in use 3
The CNN perfected classifies to test data set, if classification accuracy is less than threshold value, iteration executes the operation in 3, on the contrary
Process terminates.
The specific method is as follows for it:
(1) data are crawled and are pre-processed
In news public sentiment event category scene, data unlike it is contemplated that it is so ideal, due to data mark at
This too high etc. reason, we are difficult to the positive negative sample of accumulating and enriching, therefore how to take a large amount of and accurately have the positive and negative of mark
Sample has very big influence for classifying quality.
In this patent, using keyword combination, (, had been there is fund in this way in such as LeEco+capital chain for we
The multiple combinations of the enterprise name and bankroll problem descriptor of problem) it crawls and business capital problem news data occurs, mark fund
Problem positive sample data;Simultaneously with not occurring such as Tencent, " good " enterprise of bankroll problem, Alibaba crawls phase as keyword
News is closed, (may cannot be known as negative sample as without mark sample also with the presence of the news comprising part bankroll problem, answer
This is unknown sample also referred to as without mark sample).In this way we just have a small amount of positive sample (network crawls+the artificial mark in part
Note confirmation) and largely without mark sample;
(2) training set constructs
Learn (PU-learning, Positive and unlabeled learning) iteration without label using positive sample
From largely without sample with positive sample COS distance as far as possible is found out in mark sample set, being regarded as more reliable in (1)
Negative sample, together with positive sample, construct training set.
The application scenarios of PULearning are that we can clearly determine positive sample, but not can determine that negative sample, because
It is likely to be positive sample for it, only we prove not yet.At this moment the uncertain sample in this part can be called nothing by we
Exemplar U, in addition positive sample P establishes model.
The calculation process of PU-learning is broadly divided into two stages:
First stage: reliable negative example collection RN is selected from unmarked example, way is:
A, it randomly selects a part of positive example S in P to be added in U, at this moment two datasets are respectively P-S and U+S, are determined respectively
Justice is ps and us, and the data for being us with one two disaggregated model model, label 0 of ps and us training, label 1 is the number of ps
According to;
B, then with this classifier model for no label data U, unlabeled exemplars set U is done and is classified, calculated every
A sample belongs to the probability of negative class, sets a threshold value a, if sample classification probability is greater than a, it is considered that being a phase
To reliable negative sample.
Second stage: using positive example P and reliable negative example RN, one traditional machine learning classification model of training is used to pre-
Survey new samples.
(3) multi-model coorinated training
It is mainly divided into three steps:
A, identification and classification is carried out to no label data respectively with three kinds of sorter models fasttext, cnn, rnn, if three kinds
Model, which all differentiates, to be positive class (there are bankroll problems), then is directly added into training set as positive sample;If all differentiating the class that is negative
(bankroll problem is not present), then be negative sample;If there are two classifiers to differentiate the class that is positive, a classifier differentiates the class that is negative, then
Retain this data, carries out manual intervention mark;If there are two classifiers to differentiate the class that is negative, a classifier differentiates the class that is positive,
It disregards, continues to regard as no label data.
B, after by the operation in a, training set data is updated, then proceedes to three kinds of model model of training, calculating is being tested
The classification accuracy of concentration;
C, iteration carries out a, and the operation in b terminates iteration until the accuracy rate in test set reaches threshold value, protects
Deposit model
(4) category of model
According to updated training data is obtained in 3, trained depth convolutional neural networks CNN is to test in use 3
Data set is classified, if classification accuracy is less than threshold value (0.8), continues to execute the operation in 3, otherwise process terminates.
This patent constructs positive sample data by the purposive data that crawl, and the quality of positive sample can be made higher;Simultaneously
In conjunction with PU-learning more farther, the relatively reliable negative sample apart from positive sample can be obtained from without mark sample;Simultaneously
It can be in the generally existing a small amount of positive sample of industry and largely without mark in conjunction with PU-learning and multi-model coorinated training technology
Ideal effect is obtained in the case where signed-off sample notebook data, and business personnel is identified from public sentiment data with higher accuracy rate
The problem of concern and event, in time push and early warning substantially increase business personnel's working efficiency, and according to recognition result point
Analysis, facilitates business personnel to take risk management measure.
Detailed description of the invention
Fig. 1 is the workflow schematic diagram of this patent
Fig. 2 is the model support composition of the character level convolutional neural networks (char-CNN) of this patent
Specific embodiment
As depicted in figs. 1 and 2, the public sentiment file classification method based on deep learning includes the following steps: 1, from internet
Baidu crawls enterprise's public sentiment text, available a small amount of positive sample and largely without mark sample;2, pass through PU-Learning
Technology constructs initial training data set;3, three kinds of depth models are trained using fasttext, CNN, RNN to the data set in 2, adopted
With multi-model coorinated training, classification judgement is carried out to without mark sample data with these three models respectively, if three kinds of classifiers
Be determined as positive sample and emotion be it is negative, then be determined as positive sample, positive sample collection be added;If three kinds of classifiers are determined as
Negative sample and emotion are front, then are determined as negative sample, and negative sample collection is added;Other situations wouldn't process;4, expand in use 3
The trained CNN of data set after filling classifies to test data set, if classification accuracy is less than threshold value, iteration is executed
Operation in 3, on the contrary process terminates.
The specific method is as follows for it:
(1) data are crawled and are pre-processed
In news public sentiment event category scene, data unlike it is contemplated that it is so ideal, due to data mark at
This too high etc. reason, we are difficult to the positive negative sample of accumulating and enriching, therefore how to take a large amount of and accurately have the positive and negative of mark
Sample has very big influence for classifying quality.
In this patent, using keyword combination, (, had been there is fund in this way in such as LeEco+capital chain for we
The multiple combinations of the enterprise name and bankroll problem descriptor of problem) it crawls and business capital problem news data occurs, mark fund
Problem positive sample data;Simultaneously with not occurring such as Tencent, " good " enterprise of bankroll problem, Alibaba crawls phase as keyword
News is closed, (may cannot be known as negative sample as without mark sample also with the presence of the news comprising part bankroll problem, answer
This is unknown sample also referred to as without mark sample).In this way we just have a small amount of positive sample (network crawls+the artificial mark in part
Note confirmation) and largely without mark sample;
(2) training set constructs
Learn (PU-learning, Positive and unlabeled learning) iteration without label using positive sample
From largely without sample with positive sample COS distance as far as possible is found out in mark sample set, being regarded as more reliable in (1)
Negative sample, together with positive sample, construct training set.
The application scenarios of PULearning are that we can clearly determine positive sample, but not can determine that negative sample, because
It is likely to be positive sample for it, only we prove not yet.At this moment the uncertain sample in this part can be called nothing by we
Exemplar U, in addition positive sample P establishes model.
The calculation process of PU-learning is broadly divided into two stages:
First stage: reliable negative example collection RN is selected from unmarked example, way is:
A, it randomly selects a part of positive example S in P to be added in U, at this moment two datasets are respectively P-S and U+S, are determined respectively
Justice is ps and us, and the data for being us with one two disaggregated model model, label 0 of ps and us training, label 1 is the number of ps
According to;
B, then with this classifier model for no label data U, unlabeled exemplars set U is done and is classified, calculated every
A sample belongs to the probability of negative class, sets a threshold value a, if sample classification probability is greater than a, it is considered that being a phase
To reliable negative sample.
Second stage: using positive example P and reliable negative example RN, one traditional machine learning classification model of training is used to pre-
Survey new samples.
(3) multi-model coorinated training
It is mainly divided into three steps:
A, identification and classification is carried out to no label data respectively with three kinds of sorter models fasttext, cnn, rnn, if three kinds
Model, which all differentiates, to be positive class (there are bankroll problems), then is directly added into training set as positive sample;If all differentiating the class that is negative
(bankroll problem is not present), then be negative sample;If there are two classifiers to differentiate the class that is positive, a classifier differentiates the class that is negative, then
Retain this data, carries out manual intervention mark;If there are two classifiers to differentiate the class that is negative, a classifier differentiates the class that is positive,
It disregards, continues to regard as no label data.
B, after by the operation in a, training set data is updated, then proceedes to three kinds of model model of training, calculating is being tested
The classification accuracy of concentration;
C, iteration carries out a, and the operation in b terminates iteration until the accuracy rate in test set reaches threshold value, protects
Deposit model
(4) category of model
According to updated training data is obtained in 3, trained depth convolutional neural networks CNN is to test in use 3
Data set is classified, if classification accuracy is less than threshold value (0.8), continues to execute the operation in 3, otherwise process terminates.
This patent constructs positive sample data by the purposive data that crawl, and the quality of positive sample can be made higher;Simultaneously
In conjunction with PU-learning more farther, the relatively reliable negative sample apart from positive sample can be obtained from without mark sample;Simultaneously
It can be in the generally existing a small amount of positive sample of industry and largely without mark in conjunction with PU-learning and multi-model coorinated training technology
Ideal effect is obtained in the case where signed-off sample notebook data, and business personnel is identified from public sentiment data with higher accuracy rate
The problem of concern and event, in time push and early warning substantially increase business personnel's working efficiency, and according to recognition result point
Analysis, facilitates business personnel to take risk management measure.
The above-described embodiments are merely illustrative of preferred embodiments of the present invention, not to model of the invention
It encloses and is defined, without departing from the spirit of the design of the present invention, this field ordinary engineering and technical personnel is to the technology of the present invention side
The various changes and improvements that case is made, should fall within the scope of protection determined by the claims of the present invention.
Claims (1)
1. the public sentiment file classification method based on deep learning, includes the following steps:
1), from internet, Baidu crawls enterprise's public sentiment text, available a small amount of positive sample and largely without mark sample;
2) initial training data set, is constructed by PU-Learning technology;
3), to the data set in 2 using fasttext, CNN, RNN three kinds of depth models of training, using multi-model coorinated training,
Classification judgement is carried out to without mark sample data with these three models respectively, if three kinds of classifiers are determined as positive sample and feelings
It is negative for feeling, then is determined as positive sample, and positive sample collection is added;If three kinds of classifiers are determined as negative sample and emotion is positive
Face is then determined as negative sample, and negative sample collection is added;Other situations wouldn't process;
4) the trained CNN of data set after, expanding in use 3 classifies to test data set, if classification accuracy is small
In threshold value, then iteration executes the operation in 3, otherwise process terminates;
The specific method is as follows for it:
(1) data are crawled and are pre-processed
In news public sentiment event category scene, data unlike it is contemplated that it is so ideal, too due to data mark cost
The reasons such as height, we are difficult to the positive negative sample of accumulating and enriching, therefore how to take positive negative sample that is a large amount of and accurately having mark,
There is very big influence for classifying quality;
In this patent, we are crawled using keyword combination there is business capital problem news data, marks the positive sample of bankroll problem
Notebook data;" good " enterprise for not occurring bankroll problem is used to crawl related news as keyword simultaneously, as without mark sample;This
Sample we just have a small amount of positive sample and largely without mark sample;
(2) training set constructs
Using positive sample without label study iteration from (1) largely without mark sample set in find out and positive sample COS distance
Sample as far as possible is regarded as more structurally sound negative sample, together with positive sample, constructs training set;
The application scenarios of PULearning are that we can clearly determine positive sample, but not can determine that negative sample, because it
It is likely to be positive sample, only we prove not yet, and at this moment we can be known as the uncertain sample in this part without label
Sample U, in addition positive sample P establishes model;
The calculation process of PU-learning is broadly divided into two stages:
First stage: reliable negative example collection RN is selected from unmarked example, way is:
A, it randomly selects a part of positive example S in P to be added in U, at this moment two datasets are respectively P-S and U+S, are respectively defined as
Ps and us, the data for being us with one two disaggregated model model, label 0 of ps and us training, label 1 is the data of ps;
B, then with this classifier model for no label data U, unlabeled exemplars set U is done and is classified, each sample is calculated
Originally the probability for belonging to negative class sets a threshold value a, if sample classification probability is greater than a, it is considered that be one it is opposite can
The negative sample leaned on;
Second stage: using positive example P and reliable negative example RN, one traditional machine learning classification model of training is new for predicting
Sample;
(3) multi-model coorinated training
It is mainly divided into three steps:
A, identification and classification is carried out to no label data respectively with three kinds of sorter models fasttext, cnn, rnn, if three kinds of models
All differentiate the class that is positive, is then directly added into training set as positive sample;If differentiating the class that is negative, all for negative sample;If having two
A classifier differentiates the class that is positive, and a classifier differentiates the class that is negative, then retains this data, carries out manual intervention mark;If having
Two classifiers differentiate the class that is negative, and a classifier differentiates the class that is positive, disregards, continue to regard as no label data;
B, after by the operation in a, training set data is updated, three kinds of model model of training is then proceeded to, calculates in test set
Classification accuracy;
C, iteration carries out a, and the operation in b terminates iteration until the accuracy rate in test set reaches threshold value, saves mould
Type;
(4) category of model
According to updated training data is obtained in 3, trained depth convolutional neural networks CNN is to test data in use 3
Collection is classified, if classification accuracy is less than threshold value (0.8), continues to execute the operation in 3, otherwise process terminates.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910525459.6A CN110263166A (en) | 2019-06-18 | 2019-06-18 | Public sentiment file classification method based on deep learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910525459.6A CN110263166A (en) | 2019-06-18 | 2019-06-18 | Public sentiment file classification method based on deep learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110263166A true CN110263166A (en) | 2019-09-20 |
Family
ID=67919008
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910525459.6A Pending CN110263166A (en) | 2019-06-18 | 2019-06-18 | Public sentiment file classification method based on deep learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110263166A (en) |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110704661A (en) * | 2019-10-12 | 2020-01-17 | 腾讯科技(深圳)有限公司 | Image classification method and device |
CN110826320A (en) * | 2019-11-28 | 2020-02-21 | 上海观安信息技术股份有限公司 | Sensitive data discovery method and system based on text recognition |
CN111078879A (en) * | 2019-12-09 | 2020-04-28 | 北京邮电大学 | Method and device for detecting text sensitive information of satellite internet based on deep learning |
CN111177507A (en) * | 2019-12-31 | 2020-05-19 | 支付宝(杭州)信息技术有限公司 | Method and device for processing multi-label service |
CN111310014A (en) * | 2020-02-21 | 2020-06-19 | 深圳中兴网信科技有限公司 | Scenic spot public opinion monitoring system, method, device and storage medium based on deep learning |
CN111666414A (en) * | 2020-06-12 | 2020-09-15 | 上海观安信息技术股份有限公司 | Method for detecting cloud service by sensitive data and cloud service platform |
CN111931912A (en) * | 2020-08-07 | 2020-11-13 | 北京推想科技有限公司 | Network model training method and device, electronic equipment and storage medium |
CN111966944A (en) * | 2020-08-17 | 2020-11-20 | 中电科大数据研究院有限公司 | Model construction method for multi-level user comment security audit |
CN112115264A (en) * | 2020-09-14 | 2020-12-22 | 中国科学院计算技术研究所苏州智能计算产业技术研究院 | Text classification model adjusting method facing data distribution change |
CN112597141A (en) * | 2020-12-24 | 2021-04-02 | 国网山东省电力公司 | Network flow detection method based on public opinion analysis |
CN112819023A (en) * | 2020-06-11 | 2021-05-18 | 腾讯科技(深圳)有限公司 | Sample set acquisition method and device, computer equipment and storage medium |
CN113139381A (en) * | 2021-04-29 | 2021-07-20 | 平安国际智慧城市科技股份有限公司 | Unbalanced sample classification method and device, electronic equipment and storage medium |
CN113269229A (en) * | 2021-04-22 | 2021-08-17 | 中国科学院信息工程研究所 | Training method for enhancing generalization ability of deep learning classification model |
CN113361585A (en) * | 2021-06-02 | 2021-09-07 | 浪潮软件科技有限公司 | Method for optimizing and screening clues based on supervised learning algorithm |
CN113609298A (en) * | 2021-08-23 | 2021-11-05 | 南京擎盾信息科技有限公司 | Data processing method and device for court public opinion corpus extraction |
CN113641888A (en) * | 2021-03-31 | 2021-11-12 | 昆明理工大学 | Event-related news filtering learning method based on fusion topic information enhanced PU learning |
CN113849645A (en) * | 2021-09-28 | 2021-12-28 | 平安科技(深圳)有限公司 | Mail classification model training method, device, equipment and storage medium |
CN114223012A (en) * | 2019-10-31 | 2022-03-22 | 深圳市欢太科技有限公司 | Push object determination method and device, terminal equipment and storage medium |
CN114254588A (en) * | 2021-12-16 | 2022-03-29 | 马上消费金融股份有限公司 | Data tag processing method and device |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101980202A (en) * | 2010-11-04 | 2011-02-23 | 西安电子科技大学 | Semi-supervised classification method of unbalance data |
CN105468713A (en) * | 2015-11-19 | 2016-04-06 | 西安交通大学 | Multi-model fused short text classification method |
CN107239529A (en) * | 2017-05-27 | 2017-10-10 | 中国矿业大学 | A kind of public sentiment hot category classification method based on deep learning |
CN109299162A (en) * | 2018-11-08 | 2019-02-01 | 南京航空航天大学 | A kind of Active Learning Method classified for positive class and data untagged |
-
2019
- 2019-06-18 CN CN201910525459.6A patent/CN110263166A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101980202A (en) * | 2010-11-04 | 2011-02-23 | 西安电子科技大学 | Semi-supervised classification method of unbalance data |
CN105468713A (en) * | 2015-11-19 | 2016-04-06 | 西安交通大学 | Multi-model fused short text classification method |
CN107239529A (en) * | 2017-05-27 | 2017-10-10 | 中国矿业大学 | A kind of public sentiment hot category classification method based on deep learning |
CN109299162A (en) * | 2018-11-08 | 2019-02-01 | 南京航空航天大学 | A kind of Active Learning Method classified for positive class and data untagged |
Non-Patent Citations (2)
Title |
---|
何远生: "基于深度学习多模型融合的中文短文本情感分类算法研究与实现", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
张璞,刘畅,李逍: "基于PU学习的建议语句分类方法", 《计算机应用》 * |
Cited By (31)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110704661A (en) * | 2019-10-12 | 2020-01-17 | 腾讯科技(深圳)有限公司 | Image classification method and device |
CN110704661B (en) * | 2019-10-12 | 2021-04-13 | 腾讯科技(深圳)有限公司 | Image classification method and device |
CN114223012A (en) * | 2019-10-31 | 2022-03-22 | 深圳市欢太科技有限公司 | Push object determination method and device, terminal equipment and storage medium |
CN110826320A (en) * | 2019-11-28 | 2020-02-21 | 上海观安信息技术股份有限公司 | Sensitive data discovery method and system based on text recognition |
CN110826320B (en) * | 2019-11-28 | 2023-10-13 | 上海观安信息技术股份有限公司 | Sensitive data discovery method and system based on text recognition |
CN111078879A (en) * | 2019-12-09 | 2020-04-28 | 北京邮电大学 | Method and device for detecting text sensitive information of satellite internet based on deep learning |
CN111177507A (en) * | 2019-12-31 | 2020-05-19 | 支付宝(杭州)信息技术有限公司 | Method and device for processing multi-label service |
CN111177507B (en) * | 2019-12-31 | 2023-06-23 | 支付宝(杭州)信息技术有限公司 | Method and device for processing multi-mark service |
CN111310014A (en) * | 2020-02-21 | 2020-06-19 | 深圳中兴网信科技有限公司 | Scenic spot public opinion monitoring system, method, device and storage medium based on deep learning |
CN112819023A (en) * | 2020-06-11 | 2021-05-18 | 腾讯科技(深圳)有限公司 | Sample set acquisition method and device, computer equipment and storage medium |
CN112819023B (en) * | 2020-06-11 | 2024-02-02 | 腾讯科技(深圳)有限公司 | Sample set acquisition method, device, computer equipment and storage medium |
CN111666414B (en) * | 2020-06-12 | 2023-10-17 | 上海观安信息技术股份有限公司 | Method for detecting cloud service by sensitive data and cloud service platform |
CN111666414A (en) * | 2020-06-12 | 2020-09-15 | 上海观安信息技术股份有限公司 | Method for detecting cloud service by sensitive data and cloud service platform |
CN111931912A (en) * | 2020-08-07 | 2020-11-13 | 北京推想科技有限公司 | Network model training method and device, electronic equipment and storage medium |
CN111966944A (en) * | 2020-08-17 | 2020-11-20 | 中电科大数据研究院有限公司 | Model construction method for multi-level user comment security audit |
CN111966944B (en) * | 2020-08-17 | 2024-04-09 | 中电科大数据研究院有限公司 | Model construction method for multi-level user comment security audit |
CN112115264B (en) * | 2020-09-14 | 2024-03-22 | 中科苏州智能计算技术研究院 | Text classification model adjustment method for data distribution change |
CN112115264A (en) * | 2020-09-14 | 2020-12-22 | 中国科学院计算技术研究所苏州智能计算产业技术研究院 | Text classification model adjusting method facing data distribution change |
CN112597141A (en) * | 2020-12-24 | 2021-04-02 | 国网山东省电力公司 | Network flow detection method based on public opinion analysis |
CN112597141B (en) * | 2020-12-24 | 2022-07-15 | 国网山东省电力公司 | Network flow detection method based on public opinion analysis |
CN113641888A (en) * | 2021-03-31 | 2021-11-12 | 昆明理工大学 | Event-related news filtering learning method based on fusion topic information enhanced PU learning |
CN113641888B (en) * | 2021-03-31 | 2023-08-29 | 昆明理工大学 | Event-related news filtering learning method based on fusion topic information enhanced PU learning |
CN113269229A (en) * | 2021-04-22 | 2021-08-17 | 中国科学院信息工程研究所 | Training method for enhancing generalization ability of deep learning classification model |
CN113139381A (en) * | 2021-04-29 | 2021-07-20 | 平安国际智慧城市科技股份有限公司 | Unbalanced sample classification method and device, electronic equipment and storage medium |
CN113139381B (en) * | 2021-04-29 | 2023-11-28 | 平安国际智慧城市科技股份有限公司 | Unbalanced sample classification method, unbalanced sample classification device, electronic equipment and storage medium |
CN113361585A (en) * | 2021-06-02 | 2021-09-07 | 浪潮软件科技有限公司 | Method for optimizing and screening clues based on supervised learning algorithm |
CN113609298A (en) * | 2021-08-23 | 2021-11-05 | 南京擎盾信息科技有限公司 | Data processing method and device for court public opinion corpus extraction |
CN113849645B (en) * | 2021-09-28 | 2024-06-04 | 平安科技(深圳)有限公司 | Mail classification model training method, device, equipment and storage medium |
CN113849645A (en) * | 2021-09-28 | 2021-12-28 | 平安科技(深圳)有限公司 | Mail classification model training method, device, equipment and storage medium |
CN114254588A (en) * | 2021-12-16 | 2022-03-29 | 马上消费金融股份有限公司 | Data tag processing method and device |
CN114254588B (en) * | 2021-12-16 | 2023-10-13 | 马上消费金融股份有限公司 | Data tag processing method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110263166A (en) | Public sentiment file classification method based on deep learning | |
WO2016033907A1 (en) | Statistical machine learning-based internet hidden link detection method | |
CN107092596A (en) | Text emotion analysis method based on attention CNNs and CCR | |
CN110134849A (en) | A kind of network public-opinion monitoring method and system | |
CN106776538A (en) | The information extracting method of enterprise's noncanonical format document | |
CN106709754A (en) | Power user grouping method based on text mining | |
CN108563638B (en) | Microblog emotion analysis method based on topic identification and integrated learning | |
CN112214610A (en) | Entity relation joint extraction method based on span and knowledge enhancement | |
CN106528528A (en) | A text emotion analysis method and device | |
CN107239439A (en) | Public sentiment sentiment classification method based on word2vec | |
CN109918505B (en) | Network security event visualization method based on text processing | |
CN108614855A (en) | A kind of rumour recognition methods | |
CN103984943A (en) | Scene text identification method based on Bayesian probability frame | |
CN110909542B (en) | Intelligent semantic serial-parallel analysis method and system | |
CN110851593B (en) | Complex value word vector construction method based on position and semantics | |
CN113742733B (en) | Method and device for extracting trigger words of reading and understanding vulnerability event and identifying vulnerability type | |
CN107885849A (en) | A kind of moos index analysis system based on text classification | |
CN111984790B (en) | Entity relation extraction method | |
KR102387887B1 (en) | Apparatus for refining clean labeled data for artificial intelligence training | |
CN107368526A (en) | A kind of data processing method and device | |
CN112434163A (en) | Risk identification method, model construction method, risk identification device, electronic equipment and medium | |
CN115545437A (en) | Financial enterprise operation risk early warning method based on multi-source heterogeneous data fusion | |
CN103020286A (en) | Internet ranking list grasping system based on ranking website | |
CN114265931A (en) | Big data text mining-based consumer policy perception analysis method and system | |
CN116962089A (en) | Network monitoring method and system for information security |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190920 |