CN106776740A - A kind of social networks Text Clustering Method based on convolutional neural networks - Google Patents

A kind of social networks Text Clustering Method based on convolutional neural networks Download PDF

Info

Publication number
CN106776740A
CN106776740A CN201611027489.7A CN201611027489A CN106776740A CN 106776740 A CN106776740 A CN 106776740A CN 201611027489 A CN201611027489 A CN 201611027489A CN 106776740 A CN106776740 A CN 106776740A
Authority
CN
China
Prior art keywords
convolutional neural
neural networks
vector
text
networks
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201611027489.7A
Other languages
Chinese (zh)
Inventor
金志刚
胡博宏
罗咏梅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN201611027489.7A priority Critical patent/CN106776740A/en
Publication of CN106776740A publication Critical patent/CN106776740A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Human Resources & Organizations (AREA)
  • Economics (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)

Abstract

The present invention discloses a kind of social networks Text Clustering Method based on convolutional neural networks, comprises the following steps:Text Pretreatment:Filtering useless character, while being converted to term vector.Feature Mapping:Keep algorithm that term vector is mapped as into the available binary feature vector of convolutional neural networks model by local feature, as the target signature that convolutional neural networks are trained.Convolutional neural networks:Convolutional neural networks training process, is input with term vector, and binary feature vector is trained for target signature.K means are clustered:According to the binary feature vector that convolutional neural networks are exported, clustered using the unsupervised learning algorithm K means in machine learning, obtained cluster result.

Description

A kind of social networks Text Clustering Method based on convolutional neural networks
Technical field
The present invention is a kind of social networks Text Clustering Method based on convolutional neural networks.
Background technology
With developing rapidly for internet, increasing user likes issuing the viewpoint of oneself on network, shares individual The animation of people, while strengthening the communication exchange with friend, therefore social media also develops swift and violent.Microblogging, as a base In customer relationship Information Sharing, the platform propagated and obtain, legacy network media communication mode is changed, started one newly Social media interactive model, provide the user with the information content and easily communication way of more horn of plenty, rapidly become most Welcome social network media.
In microblog, user is by delivering individual to the view of social hotspots event, the purchase experiences of a certain product Etc. express the emotion of oneself, form the topic text message of magnanimity.These text messages of abundant Treatment Analysis are with weight Social value, commercial value and the user for wanting are worth.And the basis for effectively processing massive micro-blog information is that microblogging text is entered Row cluster, therefore microblogging text cluster realize it is significant.
The content of the invention
The present invention is directed to microblogging text cluster problem, it is considered to the unofficial property of microblogging text, openness, and design is a kind of will be suitable The clustering method of microblogging short text is closed, is that the analysis of public opinion of social networks lays the foundation.Technical scheme is as follows:
A kind of social networks Text Clustering Method based on convolutional neural networks, comprises the following steps:
1) Text Pretreatment:Filtering useless character, while being converted to term vector.
2) Feature Mapping:Keep algorithm that term vector is mapped as into convolutional neural networks model available two by local feature First characteristic vector, as the target signature that convolutional neural networks are trained.
3) convolutional neural networks:Convolutional neural networks training process, is input with term vector, and binary feature vector is target Feature is trained,
4) K-means clusters:According to convolutional neural networks export binary feature vector, using in machine learning without prison Superintend and direct learning algorithm K-means to be clustered, obtain cluster result.
Social networks Text Clustering Method based on convolutional neural networks of the invention, using convolutional neural networks to part The feature of the powerful abstract representation capability learning microblogging text of feature, and feature is carried out using the clustering algorithm in machine learning Treatment, realizes the cluster of microblogging text.This method considers the high cost of manual markings categories of datasets, therefore uses nature Local holding limit algorithm in Language Processing is processed primitive character, and text message is mapped as into binary numerical value vector, As the abstract representation of the microblogging text.
Beneficial effects of the present invention are as follows:
1. the abstract characteristics of the local feature abstract representation capability learning short text of convolutional neural networks model are utilized, and then Realize the cluster of short text;
2. the primitive character of short text is mapped as by convolutional Neural net by the Feature Mapping algorithm in natural language processing The available abstract characteristics of network model, and then avoid using the artificial labeled data collection of high cost, with practical engineering value.
Brief description of the drawings
Fig. 1 microblogging text cluster overall architectures
The convolutional neural networks framework that Fig. 2 this method is used
Specific embodiment
Implementation method is illustrated below in conjunction with the accompanying drawings.
The overall architecture of this method is as shown in figure 1, described further below:
5) Text Pretreatment:Filtering useless character, such as " forwarding microblogging " etc.;Meanwhile, changed using Word2Vec instruments It is term vector.
6) Feature Mapping:Keep algorithm that term vector is mapped as into convolutional neural networks model available two by local feature First characteristic vector, as the target signature that convolutional neural networks are trained.
7) convolutional neural networks:Convolutional neural networks training process, is input with term vector, and binary feature vector is target Feature is trained, and is described in detail below.
8) K-means clusters:Clustered using the unsupervised learning algorithm K-means in machine learning, clustered As a result.
The training pattern framework of convolutional neural networks is as shown in Fig. 2 described further below:
1) model is connected with pond layer and is constituted by some groups of (general 3-5 groups) convolutional layers.
2) four-headed arrow in model represents the training flow of model, and learning process (propagated forward) is represented to upward arrow, Down arrow represents trim process (error back propagation), and two processes are constantly circulated until error is less than threshold value, i.e. model training Complete.
3) depth characteristic is finally represented into output, for clustering.

Claims (1)

1. a kind of social networks Text Clustering Method based on convolutional neural networks, comprises the following steps:
1) Text Pretreatment:Filtering useless character, while being converted to term vector;
2) Feature Mapping:Keep algorithm that term vector is mapped as into the available binary of convolutional neural networks model by local feature special Vector is levied, as the target signature that convolutional neural networks are trained;
3) convolutional neural networks:Convolutional neural networks training process, is input with term vector, and binary feature vector is target signature It is trained;
4) K-means clusters:According to the binary feature vector that convolutional neural networks are exported, unsupervised in machine learning is used Practise algorithm K-means to be clustered, obtain cluster result.
CN201611027489.7A 2016-11-17 2016-11-17 A kind of social networks Text Clustering Method based on convolutional neural networks Pending CN106776740A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611027489.7A CN106776740A (en) 2016-11-17 2016-11-17 A kind of social networks Text Clustering Method based on convolutional neural networks

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611027489.7A CN106776740A (en) 2016-11-17 2016-11-17 A kind of social networks Text Clustering Method based on convolutional neural networks

Publications (1)

Publication Number Publication Date
CN106776740A true CN106776740A (en) 2017-05-31

Family

ID=58970278

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611027489.7A Pending CN106776740A (en) 2016-11-17 2016-11-17 A kind of social networks Text Clustering Method based on convolutional neural networks

Country Status (1)

Country Link
CN (1) CN106776740A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107247703A (en) * 2017-06-08 2017-10-13 天津大学 Microblog emotional analysis method based on convolutional neural networks and integrated study
CN107392392A (en) * 2017-08-17 2017-11-24 中国科学技术大学苏州研究院 Microblogging forwarding Forecasting Methodology based on deep learning
CN107766585A (en) * 2017-12-07 2018-03-06 中国科学院电子学研究所苏州研究院 A kind of particular event abstracting method towards social networks
CN107958259A (en) * 2017-10-24 2018-04-24 哈尔滨理工大学 A kind of image classification method based on convolutional neural networks
CN108009647A (en) * 2017-12-21 2018-05-08 东软集团股份有限公司 Equipment record processing method, device, computer equipment and storage medium
CN112487406A (en) * 2020-12-02 2021-03-12 中国电子科技集团公司第三十研究所 Network behavior analysis method based on machine learning

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103729652A (en) * 2014-01-17 2014-04-16 重庆大学 Sparsity preserving manifold embedding based hyperspectral remote sensing image classification method
CN104915386A (en) * 2015-05-25 2015-09-16 中国科学院自动化研究所 Short text clustering method based on deep semantic feature learning
CN106096642A (en) * 2016-06-07 2016-11-09 南京邮电大学 Based on the multi-modal affective characteristics fusion method differentiating locality preserving projections

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103729652A (en) * 2014-01-17 2014-04-16 重庆大学 Sparsity preserving manifold embedding based hyperspectral remote sensing image classification method
CN104915386A (en) * 2015-05-25 2015-09-16 中国科学院自动化研究所 Short text clustering method based on deep semantic feature learning
CN106096642A (en) * 2016-06-07 2016-11-09 南京邮电大学 Based on the multi-modal affective characteristics fusion method differentiating locality preserving projections

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107247703A (en) * 2017-06-08 2017-10-13 天津大学 Microblog emotional analysis method based on convolutional neural networks and integrated study
CN107392392A (en) * 2017-08-17 2017-11-24 中国科学技术大学苏州研究院 Microblogging forwarding Forecasting Methodology based on deep learning
CN107958259A (en) * 2017-10-24 2018-04-24 哈尔滨理工大学 A kind of image classification method based on convolutional neural networks
CN107766585A (en) * 2017-12-07 2018-03-06 中国科学院电子学研究所苏州研究院 A kind of particular event abstracting method towards social networks
CN107766585B (en) * 2017-12-07 2020-04-03 中国科学院电子学研究所苏州研究院 Social network-oriented specific event extraction method
CN108009647A (en) * 2017-12-21 2018-05-08 东软集团股份有限公司 Equipment record processing method, device, computer equipment and storage medium
CN108009647B (en) * 2017-12-21 2020-10-30 东软集团股份有限公司 Device record processing method and device, computer device and storage medium
CN112487406A (en) * 2020-12-02 2021-03-12 中国电子科技集团公司第三十研究所 Network behavior analysis method based on machine learning
CN112487406B (en) * 2020-12-02 2022-05-31 中国电子科技集团公司第三十研究所 Network behavior analysis method based on machine learning

Similar Documents

Publication Publication Date Title
CN106776740A (en) A kind of social networks Text Clustering Method based on convolutional neural networks
CN105512289B (en) Image search method based on deep learning and Hash
CN107122455B (en) Network user enhanced representation method based on microblog
CN109766432B (en) Chinese abstract generation method and device based on generation countermeasure network
CN110717334A (en) Text emotion analysis method based on BERT model and double-channel attention
CN109740148A (en) A kind of text emotion analysis method of BiLSTM combination Attention mechanism
CN108763216A (en) A kind of text emotion analysis method based on Chinese data collection
CN107832353A (en) A kind of social media platform deceptive information recognition methods
CN109753602B (en) Cross-social network user identity recognition method and system based on machine learning
CN108763326A (en) A kind of sentiment analysis model building method of the diversified convolutional neural networks of feature based
WO2023065859A1 (en) Item recommendation method and apparatus, and storage medium
CN106202053B (en) A kind of microblogging theme sentiment analysis method of social networks driving
CN107247703A (en) Microblog emotional analysis method based on convolutional neural networks and integrated study
CN109388731A (en) A kind of music recommended method based on deep neural network
CN107688576A (en) The structure and tendentiousness sorting technique of a kind of CNN SVM models
CN109558935A (en) Emotion recognition and exchange method and system based on deep learning
CN110096587A (en) The fine granularity sentiment classification model of LSTM-CNN word insertion based on attention mechanism
CN108564479A (en) A kind of system and method for propagating trend based on hidden link analysis much-talked-about topic
CN112199606A (en) Social media-oriented rumor detection system based on hierarchical user representation
Skorniakov et al. Make Social Networks Clean Again: Graph Embedding and Stacking Classifiers for Bot Detection.
Sadr et al. Improving the performance of text sentiment analysis using deep convolutional neural network integrated with hierarchical attention layer
Gupta et al. Sentiment analysis of the demonitization of economy 2016 India, Regionwise
Kundu et al. Deep multi-modal networks for book genre classification based on its cover
Chen et al. Understanding emojis for financial sentiment analysis
Liu et al. Semi-supervised sentiment classification method based on weibo social relationship

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20170531

RJ01 Rejection of invention patent application after publication