CN106776740A - A kind of social networks Text Clustering Method based on convolutional neural networks - Google Patents
A kind of social networks Text Clustering Method based on convolutional neural networks Download PDFInfo
- Publication number
- CN106776740A CN106776740A CN201611027489.7A CN201611027489A CN106776740A CN 106776740 A CN106776740 A CN 106776740A CN 201611027489 A CN201611027489 A CN 201611027489A CN 106776740 A CN106776740 A CN 106776740A
- Authority
- CN
- China
- Prior art keywords
- convolutional neural
- neural networks
- vector
- text
- networks
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000013527 convolutional neural network Methods 0.000 title claims abstract description 30
- 238000000034 method Methods 0.000 title claims abstract description 19
- 238000012549 training Methods 0.000 claims abstract description 7
- 238000010801 machine learning Methods 0.000 claims abstract description 5
- 238000013507 mapping Methods 0.000 claims abstract description 5
- 238000001914 filtration Methods 0.000 claims abstract description 4
- 238000004891 communication Methods 0.000 description 3
- 238000012545 processing Methods 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- 235000007926 Craterellus fallax Nutrition 0.000 description 1
- 240000007175 Datura inoxia Species 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000008451 emotion Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 238000005728 strengthening Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/01—Social networking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Business, Economics & Management (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Human Resources & Organizations (AREA)
- Economics (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Strategic Management (AREA)
- Databases & Information Systems (AREA)
- Machine Translation (AREA)
Abstract
The present invention discloses a kind of social networks Text Clustering Method based on convolutional neural networks, comprises the following steps:Text Pretreatment:Filtering useless character, while being converted to term vector.Feature Mapping:Keep algorithm that term vector is mapped as into the available binary feature vector of convolutional neural networks model by local feature, as the target signature that convolutional neural networks are trained.Convolutional neural networks:Convolutional neural networks training process, is input with term vector, and binary feature vector is trained for target signature.K means are clustered:According to the binary feature vector that convolutional neural networks are exported, clustered using the unsupervised learning algorithm K means in machine learning, obtained cluster result.
Description
Technical field
The present invention is a kind of social networks Text Clustering Method based on convolutional neural networks.
Background technology
With developing rapidly for internet, increasing user likes issuing the viewpoint of oneself on network, shares individual
The animation of people, while strengthening the communication exchange with friend, therefore social media also develops swift and violent.Microblogging, as a base
In customer relationship Information Sharing, the platform propagated and obtain, legacy network media communication mode is changed, started one newly
Social media interactive model, provide the user with the information content and easily communication way of more horn of plenty, rapidly become most
Welcome social network media.
In microblog, user is by delivering individual to the view of social hotspots event, the purchase experiences of a certain product
Etc. express the emotion of oneself, form the topic text message of magnanimity.These text messages of abundant Treatment Analysis are with weight
Social value, commercial value and the user for wanting are worth.And the basis for effectively processing massive micro-blog information is that microblogging text is entered
Row cluster, therefore microblogging text cluster realize it is significant.
The content of the invention
The present invention is directed to microblogging text cluster problem, it is considered to the unofficial property of microblogging text, openness, and design is a kind of will be suitable
The clustering method of microblogging short text is closed, is that the analysis of public opinion of social networks lays the foundation.Technical scheme is as follows:
A kind of social networks Text Clustering Method based on convolutional neural networks, comprises the following steps:
1) Text Pretreatment:Filtering useless character, while being converted to term vector.
2) Feature Mapping:Keep algorithm that term vector is mapped as into convolutional neural networks model available two by local feature
First characteristic vector, as the target signature that convolutional neural networks are trained.
3) convolutional neural networks:Convolutional neural networks training process, is input with term vector, and binary feature vector is target
Feature is trained,
4) K-means clusters:According to convolutional neural networks export binary feature vector, using in machine learning without prison
Superintend and direct learning algorithm K-means to be clustered, obtain cluster result.
Social networks Text Clustering Method based on convolutional neural networks of the invention, using convolutional neural networks to part
The feature of the powerful abstract representation capability learning microblogging text of feature, and feature is carried out using the clustering algorithm in machine learning
Treatment, realizes the cluster of microblogging text.This method considers the high cost of manual markings categories of datasets, therefore uses nature
Local holding limit algorithm in Language Processing is processed primitive character, and text message is mapped as into binary numerical value vector,
As the abstract representation of the microblogging text.
Beneficial effects of the present invention are as follows:
1. the abstract characteristics of the local feature abstract representation capability learning short text of convolutional neural networks model are utilized, and then
Realize the cluster of short text;
2. the primitive character of short text is mapped as by convolutional Neural net by the Feature Mapping algorithm in natural language processing
The available abstract characteristics of network model, and then avoid using the artificial labeled data collection of high cost, with practical engineering value.
Brief description of the drawings
Fig. 1 microblogging text cluster overall architectures
The convolutional neural networks framework that Fig. 2 this method is used
Specific embodiment
Implementation method is illustrated below in conjunction with the accompanying drawings.
The overall architecture of this method is as shown in figure 1, described further below:
5) Text Pretreatment:Filtering useless character, such as " forwarding microblogging " etc.;Meanwhile, changed using Word2Vec instruments
It is term vector.
6) Feature Mapping:Keep algorithm that term vector is mapped as into convolutional neural networks model available two by local feature
First characteristic vector, as the target signature that convolutional neural networks are trained.
7) convolutional neural networks:Convolutional neural networks training process, is input with term vector, and binary feature vector is target
Feature is trained, and is described in detail below.
8) K-means clusters:Clustered using the unsupervised learning algorithm K-means in machine learning, clustered
As a result.
The training pattern framework of convolutional neural networks is as shown in Fig. 2 described further below:
1) model is connected with pond layer and is constituted by some groups of (general 3-5 groups) convolutional layers.
2) four-headed arrow in model represents the training flow of model, and learning process (propagated forward) is represented to upward arrow,
Down arrow represents trim process (error back propagation), and two processes are constantly circulated until error is less than threshold value, i.e. model training
Complete.
3) depth characteristic is finally represented into output, for clustering.
Claims (1)
1. a kind of social networks Text Clustering Method based on convolutional neural networks, comprises the following steps:
1) Text Pretreatment:Filtering useless character, while being converted to term vector;
2) Feature Mapping:Keep algorithm that term vector is mapped as into the available binary of convolutional neural networks model by local feature special
Vector is levied, as the target signature that convolutional neural networks are trained;
3) convolutional neural networks:Convolutional neural networks training process, is input with term vector, and binary feature vector is target signature
It is trained;
4) K-means clusters:According to the binary feature vector that convolutional neural networks are exported, unsupervised in machine learning is used
Practise algorithm K-means to be clustered, obtain cluster result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611027489.7A CN106776740A (en) | 2016-11-17 | 2016-11-17 | A kind of social networks Text Clustering Method based on convolutional neural networks |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611027489.7A CN106776740A (en) | 2016-11-17 | 2016-11-17 | A kind of social networks Text Clustering Method based on convolutional neural networks |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106776740A true CN106776740A (en) | 2017-05-31 |
Family
ID=58970278
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611027489.7A Pending CN106776740A (en) | 2016-11-17 | 2016-11-17 | A kind of social networks Text Clustering Method based on convolutional neural networks |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106776740A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107247703A (en) * | 2017-06-08 | 2017-10-13 | 天津大学 | Microblog emotional analysis method based on convolutional neural networks and integrated study |
CN107392392A (en) * | 2017-08-17 | 2017-11-24 | 中国科学技术大学苏州研究院 | Microblogging forwarding Forecasting Methodology based on deep learning |
CN107766585A (en) * | 2017-12-07 | 2018-03-06 | 中国科学院电子学研究所苏州研究院 | A kind of particular event abstracting method towards social networks |
CN107958259A (en) * | 2017-10-24 | 2018-04-24 | 哈尔滨理工大学 | A kind of image classification method based on convolutional neural networks |
CN108009647A (en) * | 2017-12-21 | 2018-05-08 | 东软集团股份有限公司 | Equipment record processing method, device, computer equipment and storage medium |
CN112487406A (en) * | 2020-12-02 | 2021-03-12 | 中国电子科技集团公司第三十研究所 | Network behavior analysis method based on machine learning |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103729652A (en) * | 2014-01-17 | 2014-04-16 | 重庆大学 | Sparsity preserving manifold embedding based hyperspectral remote sensing image classification method |
CN104915386A (en) * | 2015-05-25 | 2015-09-16 | 中国科学院自动化研究所 | Short text clustering method based on deep semantic feature learning |
CN106096642A (en) * | 2016-06-07 | 2016-11-09 | 南京邮电大学 | Based on the multi-modal affective characteristics fusion method differentiating locality preserving projections |
-
2016
- 2016-11-17 CN CN201611027489.7A patent/CN106776740A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103729652A (en) * | 2014-01-17 | 2014-04-16 | 重庆大学 | Sparsity preserving manifold embedding based hyperspectral remote sensing image classification method |
CN104915386A (en) * | 2015-05-25 | 2015-09-16 | 中国科学院自动化研究所 | Short text clustering method based on deep semantic feature learning |
CN106096642A (en) * | 2016-06-07 | 2016-11-09 | 南京邮电大学 | Based on the multi-modal affective characteristics fusion method differentiating locality preserving projections |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107247703A (en) * | 2017-06-08 | 2017-10-13 | 天津大学 | Microblog emotional analysis method based on convolutional neural networks and integrated study |
CN107392392A (en) * | 2017-08-17 | 2017-11-24 | 中国科学技术大学苏州研究院 | Microblogging forwarding Forecasting Methodology based on deep learning |
CN107958259A (en) * | 2017-10-24 | 2018-04-24 | 哈尔滨理工大学 | A kind of image classification method based on convolutional neural networks |
CN107766585A (en) * | 2017-12-07 | 2018-03-06 | 中国科学院电子学研究所苏州研究院 | A kind of particular event abstracting method towards social networks |
CN107766585B (en) * | 2017-12-07 | 2020-04-03 | 中国科学院电子学研究所苏州研究院 | Social network-oriented specific event extraction method |
CN108009647A (en) * | 2017-12-21 | 2018-05-08 | 东软集团股份有限公司 | Equipment record processing method, device, computer equipment and storage medium |
CN108009647B (en) * | 2017-12-21 | 2020-10-30 | 东软集团股份有限公司 | Device record processing method and device, computer device and storage medium |
CN112487406A (en) * | 2020-12-02 | 2021-03-12 | 中国电子科技集团公司第三十研究所 | Network behavior analysis method based on machine learning |
CN112487406B (en) * | 2020-12-02 | 2022-05-31 | 中国电子科技集团公司第三十研究所 | Network behavior analysis method based on machine learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106776740A (en) | A kind of social networks Text Clustering Method based on convolutional neural networks | |
CN105512289B (en) | Image search method based on deep learning and Hash | |
CN107122455B (en) | Network user enhanced representation method based on microblog | |
CN109766432B (en) | Chinese abstract generation method and device based on generation countermeasure network | |
CN110717334A (en) | Text emotion analysis method based on BERT model and double-channel attention | |
CN109740148A (en) | A kind of text emotion analysis method of BiLSTM combination Attention mechanism | |
CN108763216A (en) | A kind of text emotion analysis method based on Chinese data collection | |
CN107832353A (en) | A kind of social media platform deceptive information recognition methods | |
CN109753602B (en) | Cross-social network user identity recognition method and system based on machine learning | |
CN108763326A (en) | A kind of sentiment analysis model building method of the diversified convolutional neural networks of feature based | |
WO2023065859A1 (en) | Item recommendation method and apparatus, and storage medium | |
CN106202053B (en) | A kind of microblogging theme sentiment analysis method of social networks driving | |
CN107247703A (en) | Microblog emotional analysis method based on convolutional neural networks and integrated study | |
CN109388731A (en) | A kind of music recommended method based on deep neural network | |
CN107688576A (en) | The structure and tendentiousness sorting technique of a kind of CNN SVM models | |
CN109558935A (en) | Emotion recognition and exchange method and system based on deep learning | |
CN110096587A (en) | The fine granularity sentiment classification model of LSTM-CNN word insertion based on attention mechanism | |
CN108564479A (en) | A kind of system and method for propagating trend based on hidden link analysis much-talked-about topic | |
CN112199606A (en) | Social media-oriented rumor detection system based on hierarchical user representation | |
Skorniakov et al. | Make Social Networks Clean Again: Graph Embedding and Stacking Classifiers for Bot Detection. | |
Sadr et al. | Improving the performance of text sentiment analysis using deep convolutional neural network integrated with hierarchical attention layer | |
Gupta et al. | Sentiment analysis of the demonitization of economy 2016 India, Regionwise | |
Kundu et al. | Deep multi-modal networks for book genre classification based on its cover | |
Chen et al. | Understanding emojis for financial sentiment analysis | |
Liu et al. | Semi-supervised sentiment classification method based on weibo social relationship |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170531 |
|
RJ01 | Rejection of invention patent application after publication |