CN111125486A - Microblog user attribute analysis method based on multiple features - Google Patents

Microblog user attribute analysis method based on multiple features Download PDF

Info

Publication number
CN111125486A
CN111125486A CN201911340531.4A CN201911340531A CN111125486A CN 111125486 A CN111125486 A CN 111125486A CN 201911340531 A CN201911340531 A CN 201911340531A CN 111125486 A CN111125486 A CN 111125486A
Authority
CN
China
Prior art keywords
user
microblog
text
representing
attribute analysis
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911340531.4A
Other languages
Chinese (zh)
Other versions
CN111125486B (en
Inventor
程克非
单凤池
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University of Post and Telecommunications
Original Assignee
Chongqing University of Post and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Post and Telecommunications filed Critical Chongqing University of Post and Telecommunications
Priority to CN201911340531.4A priority Critical patent/CN111125486B/en
Publication of CN111125486A publication Critical patent/CN111125486A/en
Application granted granted Critical
Publication of CN111125486B publication Critical patent/CN111125486B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Business, Economics & Management (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • General Health & Medical Sciences (AREA)
  • Economics (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a microblog user attribute analysis method based on multiple features, and belongs to the technical field of intelligent media calculation and big data analysis. The method comprises the following steps: s1, crawling user microblog blog information by using crawler software, cleaning and marking; s2, constructing word vectors of microblog blog information through a word2vec model, and obtaining user microblog text characteristics on the basis of the word vectors according to a combination strategy of ensemble learning; s3, constructing a multi-feature system for microblog attribute analysis from user microblog data, and constructing a composite feature according with the user attribute analysis through basic features; s4, fusing the multiple base classifiers by adopting a Stacking model fusion technology, constructing a microblog user attribute analysis model, and inputting data to be detected to obtain a final microblog user attribute analysis result. According to the invention, the accuracy of attribute classification of the microblog users is improved, and technical support is provided for merchants to provide more efficient personalized recommendation for the users.

Description

Microblog user attribute analysis method based on multiple features
Technical Field
The invention belongs to the technical field of intelligent media calculation and big data analysis, and relates to a microblog user attribute analysis method based on multiple features.
Background
With the increasing popularity of online social media, network information becomes voluminous and confusing. By means of computer technology, the method deeply understands the basic information of individuals and groups, excavates social psychology and behavior modes, quickly and accurately provides personalized and multi-aspect decision support, assists in solving the actual social problems, and becomes an important subject of common attention in the academic and industrial fields at present. Deep understanding of user information and user behavior is one of the core contents therein. Since personal attribute data often involves privacy problems, users often choose to hide their personal information in ways of not filling in or filling in false information, and so on, so that the basic information related to the users often cannot be directly acquired. User attribute analysis may address such issues.
At present, the research work in the aspect of user attribute analysis at home and abroad usually starts from three aspects of supervised learning, semi-supervised learning and unsupervised learning. Compared with semi-supervised learning data sparseness and unsupervised learning, the accuracy is lower, and the combination of novel composite features is more suitable for analysis of user attributes under the condition that a multi-feature system is constructed by supervised learning. And the characteristics considered by the existing microblog user attribute analysis method are not perfect enough, so that the accuracy of the obtained analysis result is not high.
Disclosure of Invention
In view of this, the invention aims to provide a multi-feature-based microblog user attribute analysis method, which aims to improve the accuracy of microblog user attribute classification so that merchants provide more efficient personalized recommendation for users.
In order to achieve the purpose, the invention provides the following technical scheme:
a microblog user attribute analysis method based on multiple features specifically comprises the following steps:
s1: crawling user microblog blog information by using crawler software, cleaning and marking;
s2: constructing a word vector of microblog blog information through a word2vec model, and obtaining user microblog text characteristics according to an ensemble learning combination strategy on the basis;
s3: constructing a multi-feature system for microblog attribute analysis from user microblog data, and constructing a composite feature which accords with the user attribute analysis through a basic feature;
s4: fusing the plurality of base classifiers by adopting a Stacking model fusion technology, constructing a microblog user attribute analysis model, and inputting data to be detected to obtain a final microblog user attribute analysis result.
Further, in step S2, the specific construction step of the user microblog text feature includes:
s21: carrying out word segmentation processing on the sample by using a Jieba word segmentation tool, removing stop words, merging microblogs of each user to obtain a user blog collection
Figure BDA0002332136690000028
miA set of microblogs with user ID i is represented,
Figure BDA0002332136690000029
a set of micro-blogs representing a single user,
Figure BDA00023321366900000210
wta word representing a single microblog;
s22: training microblogs of microblog users through a Skip-Gram model to obtain 300-dimensional word vectors in the microblogs, and calculating the microblog vector of each user, wherein the calculation formula is as follows:
Figure BDA0002332136690000021
wherein u isiDenotes a user with ID i, K denotes a user uiNumber of microblog words, WveckA word vector representing a kth word;
s23: the method comprises the steps of taking a Stacking model as a combined strategy of integrated learning, taking a Support Vector Machine (SVM), a decision tree (decision tree), Logistic regression (Logistic), an optical gradient elevator (LightGBM) and extreme gradient elevator (XGboost) as primary classifiers, combining prediction results of the primary classifiers by Logistic regression (Logistic) serving as a two-layer classifier, and finally obtaining microblog text characteristics of a user.
Further, in step S3, the constructed composite feature includes: the user activity, the user microblog time distribution and the user behavior habits;
the user liveness feature fuseractive(ui) The calculation formula of (a) is as follows:
Figure BDA0002332136690000022
wherein u isiIndicating a user with an ID of i, fsum(ui) Representing user uiTotal number of microblogs, ftranspond(ui) Representing user uiNumber of microblogs forwarded, ftime(ui) Representing user uiThe time interval between the first microblog and the last microblog;
the user microblog time distribution
Figure BDA0002332136690000023
The calculation formula of (a) is as follows:
Figure BDA0002332136690000024
wherein,
Figure BDA0002332136690000025
indicating a user with ID i located in time period j,
Figure BDA0002332136690000026
representing user uiThe number of microblogs issued at time j,
Figure BDA0002332136690000027
representing user uiThe number of microblogs forwarded at time j;
the user behavior habit fuserBehavior(ui) The calculation formula of (a) is as follows:
fuserBehavior(ui)=ftextBehavior(ui)+ftextSource(ui)+finforIntegrity(ui)
wherein f istextBehavior(ui) Representing user uiText behavior habit of ftextSource(ui) Representing user uiInformation of the source of the blog article, finforIntegrity(ui) Representing user uiThe basic information integrity of (1).
Further, the text behavior habit of the user is obtained by calculating the proportion of the emoticons and the pictures in the microblog according to the following specific calculation formula:
Figure BDA0002332136690000031
wherein f isemoticons(textn) Representing the number of expression symbols in the nth microblog, fpicture(textn) Representing the number of pictures in the nth microblog, N representing the user uiThe number of microblogs.
Further, the user's blog source information is based on the male's idiomatic text source fmSource(ui) And a source of female's customary text ffSource(ui) The calculation formula is as follows: f. oftextSource(ui)=fmSource(ui)-ffSource(ui)。
Further, the male idiomatic text source fmSource(ui) The calculation formula of (a) is as follows:
Figure BDA0002332136690000032
wherein N represents a user uiNumber of microblogs, fmSourceNum(textj) The nth microblog source is a male text source, and sourceNum is the total number of the text sources.
Further, the calculation formula of the female familiar text source is as follows:
Figure BDA0002332136690000033
wherein N represents a user uiNumber of microblogs, ffSourceNum(textj) The nth microblog source is a female text source, and sourceNum is the total number of the text sources.
Further, the user information integrity specifically includes: f. ofinforIntegrityThe basic information integrity of the user is represented, the basic information comprises a nickname, a registered place, gender, birthday, brief introduction, education information and head portrait information of the user, and the calculation formula is as follows:
Figure BDA0002332136690000034
wherein f isnameIndicating whether there is a nickname, flocationIndicating whether there is a registered location, fbirthdayIndicating whether there is birthday information, fintroductionIndicating whether there is a profile, feducationIndicating whether there is educational information, fheadPhotoIndicating whether there is avatar information, and m indicates the total number of basic information.
Further, in step S4, the fusing the multiple base classifiers by using a Stacking model fusion technique to construct the microblog user attribute analysis model specifically includes: the microblog user attribute analysis model is constructed by using a Support Vector Machine (SVM), a decision tree (decision tree), a Logistic regression (Logistic), an optical gradient elevator (LightGBM) and an extreme gradient elevator (XGboost) as primary classifiers, and the Logistic regression (Logistic) as a two-layer classifier.
The invention has the beneficial effects that: according to the invention, various characteristics of the microblog of the user are fully considered, and various personalized data of the microblog user are obtained by training according to the established microblog user attribute analysis model, so that the accuracy of attribute classification of the microblog user is improved, and technical support is provided for merchants to provide more efficient personalized recommendation for the user.
Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objectives and other advantages of the invention may be realized and attained by the means of the instrumentalities and combinations particularly pointed out hereinafter.
Drawings
For the purposes of promoting a better understanding of the objects, aspects and advantages of the invention, reference will now be made to the following detailed description taken in conjunction with the accompanying drawings in which:
FIG. 1 is a general flow chart of microblog user attribute analysis according to the present invention;
FIG. 2 is a flow chart of construction and extraction of microblog user attribute analysis text features in the invention;
FIG. 3 is a flow chart of microblog user attribute analysis non-text feature construction and extraction in the present invention;
FIG. 4 is a flowchart of microblog user attribute analysis model construction in the invention.
Detailed Description
The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention in a schematic way, and the features in the following embodiments and examples may be combined with each other without conflict.
Referring to fig. 1 to 4, fig. 1 is a general flowchart of a microblog user attribute analysis method according to a preferred embodiment of the present invention, where the microblog user attribute analysis method according to this embodiment may be executed as a computer program or as a plug-in executed in other programs, and the specific execution process includes:
step S1: and (5) preprocessing the data.
The data preprocessing comprises two stages of data cleaning and marking. And in the cleaning stage, processing abnormal values and null values in the data to ensure the integrity of the sample data. And in the marking stage, manual marking is carried out on the acquired data according to priori knowledge, and the data are divided into a male type and a female type, wherein 0 represents the male type, and 1 represents the female type.
Step S2: and constructing a word vector of the microblog blog information through word2vec, and obtaining microblog text characteristics according to an ensemble learning combination strategy on the basis. The method specifically comprises the following steps:
step S3: constructing a multi-feature system for microblog attribute analysis from user microblog data, and constructing a composite feature which accords with the user attribute analysis through a basic feature;
step S4: and fusing the plurality of base classifiers by adopting a Stacking model fusion technology to obtain a final microblog user attribute analysis result.
Specifically, as shown in fig. 2, step S2 specifically includes the following steps:
step S21: performing word segmentation processing on each microblog of the users to stop words, and merging the microblogs of each user on the basis to obtain a user blog collection
Figure BDA0002332136690000051
miA set of microblogs with user ID i is represented,
Figure BDA0002332136690000052
a set of micro-blogs representing a single user,
Figure BDA0002332136690000053
wta word representing a single microblog.
Step S22: the method comprises the steps of training a crawled microblog user microblog through a Skip-Gram model to obtain a 300-dimensional word vector in the microblog, and calculating a microblog vector of each user through a formula, wherein the formula is as follows:
Figure BDA0002332136690000054
wherein u isiDenotes a user with ID i, K denotes a user uiThe number of words in the microblog words,Wvecka word vector representing a kth word;
step S23: a stacking model is adopted as an integrated learning strategy, a Support Vector Machine (SVM), a decision tree (decision tree), Logistic regression (Logistic), an optical gradient elevator (LightGBM) and extreme gradient elevator (XGboost) are used as base classifiers, and the Logistic regression (Logistic) is used as a meta classifier to construct a microblog user attribute analysis model.
And step S24, inputting the training set into the model for fitting, and performing parameter tuning by a grid search method to obtain an optimal model.
And step S25, inputting the training set into the model obtained in step S24 to obtain text features.
As shown in fig. 3, step S3 specifically includes the following steps:
step S31: a multi-feature system for microblog attribute analysis is constructed from user microblog data, and comprises text features, time features, statistical features, numerical features and content features, as shown in table 1:
TABLE 1 Multi-feature systems Table
Figure BDA0002332136690000055
Step S32: and constructing three composite characteristics of user activity, microblog time distribution and user behavior habits on the basis of the extracted multi-characteristic system.
Specifically, the calculation formula of the user activity characteristic is as follows:
Figure BDA0002332136690000061
wherein u isiIndicating a user with an ID of i, fsum(ui) Representing user uiTotal number of microblogs, ftranspond(ui) Representing user uiNumber of microblogs forwarded, ftime(ui) Representing user uiThe time interval of the first microblog and the last microblog is released.
The calculation formula of the microblog time distribution characteristics of the user is as follows:
Figure BDA0002332136690000062
wherein,
Figure BDA0002332136690000063
represents a user whose ID is i in a time period j (0. ltoreq. j.ltoreq.23),
Figure BDA0002332136690000064
the ID is represented as the number of microblogs issued by the user at the moment j,
Figure BDA0002332136690000065
and the number of microblogs forwarded by the user with the ID i at the moment j is shown.
The user behavior habit characteristics are as follows: according to the user text behavior habit ftextBehaviorUser Bowen Source information ftextSourceAnd user information integrity finforIntegrityThe calculation is carried out, and the specific calculation formula is as follows:
fuserBehavior(ui)=ftextBehavior+ftextSource+finforIntegrity
the user text behavior habit is obtained by calculating the proportion of emoticons and pictures in the user microblog, and the calculation formula is as follows:
Figure BDA0002332136690000066
wherein f istextBehavior(ui) Representing user uiHair habit of uiDenotes a user with ID i, N denotes a user uiNumber of microblogs, femoticons(textj) Representing the number of expression symbols in the nth microblog, fpicture(textn) And representing the number of pictures in the nth microblog.
User Bowen source information: according to the male's habitual text source fmSource(ui) And a source of female's customary text ffSource(ui) Calculating to obtain the user Bowen source information, wherein the calculation formula is as follows:
ftextSource(ui)=fmSource(ui)-ffSource(ui)
male idiomatic text sources: obtaining a male familiar text source f for a male text source and the number of the text sources according to the microblog source of the usermSource(ui) The formula is as follows:
Figure BDA0002332136690000067
wherein f ismSourceNum(textj) The nth microblog source is a male text source, and sourceNum is the total number of the text sources.
Female familiar text sources: obtaining a female conventional text source f according to the number of the female text sources and the microblog sources of the userfSource(ui) The formula is as follows:
Figure BDA0002332136690000068
wherein f isfSourceNum(textj) The nth microblog source is a female text source, and sourceNum is the total number of the text sources.
The user information integrity specifically includes: f. ofinforIntegrityThe basic information integrity of the user is represented, the basic information comprises a nickname, a location, a gender, a birthday, a brief introduction, education information and head portrait information of the user, and the specific formula is as follows:
Figure BDA0002332136690000071
wherein f isnameIndicating whether there is a nickname, flocationIndicating whether there is a registered location, fbirthdayIndicating whether there is birthday information, fintroductionIndicating whether there is a profile, feducationIndicating whether there is educational information, fheadPhotoIndicating whether there is head portrait informationAnd m denotes the total number of basic information.
As shown in fig. 4, step S4 includes:
step S41: the Stacking method is adopted as a combined strategy of ensemble learning to construct a rumor recognition model, a Support Vector Machine (SVM), a decision tree (decision tree), a Logistic regression (Logistic), an optical gradient elevator (LightGBM) and an extreme gradient elevator (XGboost) are used as a primary classifier of the Stacking model, and the Logistic regression (Logistic) model is used as a two-layer classifier.
Step S42: inputting the training set into a model for fitting, and performing parameter tuning by a grid search method to obtain an optimal model.
Step S43: and inputting the test set into the fitting model to obtain a final user attribute analysis result.
Finally, the above embodiments are only intended to illustrate the technical solutions of the present invention and not to limit the present invention, and although the present invention has been described in detail with reference to the preferred embodiments, it will be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions, and all of them should be covered by the claims of the present invention.

Claims (9)

1. A microblog user attribute analysis method based on multiple features is characterized by specifically comprising the following steps of:
s1: crawling user microblog blog information by using crawler software, cleaning and marking;
s2: constructing a word vector of microblog blog information through a word2vec model, and obtaining user microblog text characteristics according to an ensemble learning combination strategy on the basis;
s3: constructing a multi-feature system for microblog attribute analysis from user microblog data, and constructing a composite feature which accords with the user attribute analysis through a basic feature;
s4: fusing the plurality of base classifiers by adopting a Stacking model fusion technology, constructing a microblog user attribute analysis model, and inputting data to be detected to obtain a final microblog user attribute analysis result.
2. The method for analyzing the attributes of the microblog users based on the multiple features according to claim 1, wherein in the step S2, the construction of the microblog text features of the users specifically comprises the following steps:
s21: carrying out word segmentation processing on the sample by using a Jieba word segmentation tool, removing stop words, merging microblogs of each user to obtain a user blog collection
Figure FDA0002332136680000011
miA set of microblogs with user ID i is represented,
Figure FDA0002332136680000012
a set of micro-blogs representing a single user,
Figure FDA0002332136680000013
wta word representing a single microblog;
s22: training microblogs of microblog users through a Skip-Gram model to obtain 300-dimensional word vectors in the microblogs, and calculating the microblog vector of each user, wherein the calculation formula is as follows:
Figure FDA0002332136680000014
wherein u isiDenotes a user with ID i, K denotes a user uiNumber of microblog words, WveckA word vector representing a kth word;
s23: and taking a Stacking model as a combined strategy of ensemble learning, taking a support vector machine, a decision tree, logistic regression, an optical gradient elevator and extreme gradient elevator as primary classifiers, and obtaining a prediction result by combining the logistic regression as a two-layer classifier to finally obtain the microblog text characteristics of the user.
3. The method for analyzing the attributes of the microblog users based on the multiple features according to claim 1, wherein in the step S3, the constructed composite features comprise: the user activity, the user microblog time distribution and the user behavior habits;
the user liveness feature fuseractive(ui) The calculation formula of (a) is as follows:
Figure FDA0002332136680000015
wherein u isiIndicating a user with an ID of i, fsum(ui) Representing user uiTotal number of microblogs, ftranspond(ui) Representing user uiNumber of microblogs forwarded, ftime(ui) Representing user uiThe time interval between the first microblog and the last microblog;
the user microblog time distribution
Figure FDA0002332136680000016
The calculation formula of (a) is as follows:
Figure FDA0002332136680000017
wherein,
Figure FDA0002332136680000021
indicating a user with ID i located in time period j,
Figure FDA0002332136680000022
representing user uiThe number of microblogs issued at time j,
Figure FDA0002332136680000023
representing user uiThe number of microblogs forwarded at time j;
the user behavior habit fuserBehavior(ui) The calculation formula of (a) is as follows:
fuserBehavior(ui)=ftextBehavior(ui)+ftextSource(ui)+finforIntegrity(ui)
wherein f istextBehavior(ui) Representing user uiText behavior habit of ftextSource(ui) Representing user uiInformation of the source of the blog article, finforIntegrity(ui) Representing user uiThe basic information integrity of (1).
4. The multi-feature-based microblog user attribute analysis method according to claim 3, wherein the text behavior habit of the user is calculated according to the proportion of emoticons and pictures in the microblog of the user, and the specific calculation formula is as follows:
Figure FDA0002332136680000024
wherein f isemoticons(textn) Representing the number of expression symbols in the nth microblog, fpicture(textn) Representing the number of pictures in the nth microblog, N representing the user uiThe number of microblogs.
5. The method for analyzing the attributes of microblog users based on multiple features of claim 3, wherein the user Bowen source information is according to a male familiar text source fmSource(ui) And a source of female's customary text ffSource(ui) The calculation formula is as follows: f. oftextSource(ui)=fmSource(ui)-ffSource(ui)。
6. The method according to claim 5, wherein the male idiomatic text source f is a source of multi-feature-based microblog user attributesmSource(ui) The calculation formula of (a) is as follows:
Figure FDA0002332136680000025
wherein N represents a user uiNumber of microblogs, fmSourceNum(textj) The nth microblog source is a male text source, and sourceNum is the total number of the text sources.
7. The method for analyzing the attributes of the microblog users based on the multi-feature of claim 5, wherein the calculation formula of the female familiar text source is as follows:
Figure FDA0002332136680000026
wherein N represents a user uiNumber of microblogs, ffSourceNum(textj) The nth microblog source is a female text source, and sourceNum is the total number of the text sources.
8. The multi-feature-based microblog user attribute analysis method according to claim 3, wherein the user information integrity degree specifically includes: f. ofinforIntegrityThe basic information integrity of the user is represented, the basic information comprises a nickname, a registered place, gender, birthday, brief introduction, education information and head portrait information of the user, and the calculation formula is as follows:
Figure FDA0002332136680000027
wherein f isnameIndicating whether there is a nickname, flocationIndicating whether there is a registered location, fbirthdayIndicating whether there is birthday information, fintroductionIndicating whether there is a profile, feducationIndicating whether there is educational information, fheadPhotoIndicating whether there is avatar information, and m indicates the total number of basic information.
9. The method for analyzing attributes of microblog users based on multiple features according to claim 1, wherein in the step S4, fusing the multiple base classifiers by using a Stacking model fusion technique to construct the microblog user attribute analysis model specifically comprises: and constructing a microblog user attribute analysis model by using a support vector machine, a decision tree, logistic regression, an optical gradient elevator and extreme gradient elevator as primary classifiers and using the logistic regression as a secondary classifier.
CN201911340531.4A 2019-12-23 2019-12-23 Microblog user attribute analysis method based on multiple features Active CN111125486B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911340531.4A CN111125486B (en) 2019-12-23 2019-12-23 Microblog user attribute analysis method based on multiple features

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911340531.4A CN111125486B (en) 2019-12-23 2019-12-23 Microblog user attribute analysis method based on multiple features

Publications (2)

Publication Number Publication Date
CN111125486A true CN111125486A (en) 2020-05-08
CN111125486B CN111125486B (en) 2022-11-25

Family

ID=70501405

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911340531.4A Active CN111125486B (en) 2019-12-23 2019-12-23 Microblog user attribute analysis method based on multiple features

Country Status (1)

Country Link
CN (1) CN111125486B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111984872A (en) * 2020-09-09 2020-11-24 北京中科研究院 Multi-modal information social media popularity prediction method based on iterative optimization strategy

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160164866A1 (en) * 2014-12-09 2016-06-09 Duo Security, Inc. System and method for applying digital fingerprints in multi-factor authentication
CN106202211A (en) * 2016-06-27 2016-12-07 四川大学 A kind of integrated microblogging rumour recognition methods based on microblogging type
CN106296422A (en) * 2016-07-29 2017-01-04 重庆邮电大学 A kind of social networks junk user detection method merging many algorithms
CN106649515A (en) * 2016-10-17 2017-05-10 中国电子技术标准化研究院 Real-time micro-blog classifier based on multiple search models
CN108090607A (en) * 2017-12-13 2018-05-29 中山大学 A kind of social media user's ascribed characteristics of population Forecasting Methodology based on the fusion of multi-model storehouse
CN108710609A (en) * 2018-05-07 2018-10-26 南京邮电大学 A kind of analysis method of social platform user information based on multi-feature fusion

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160164866A1 (en) * 2014-12-09 2016-06-09 Duo Security, Inc. System and method for applying digital fingerprints in multi-factor authentication
CN106202211A (en) * 2016-06-27 2016-12-07 四川大学 A kind of integrated microblogging rumour recognition methods based on microblogging type
CN106296422A (en) * 2016-07-29 2017-01-04 重庆邮电大学 A kind of social networks junk user detection method merging many algorithms
CN106649515A (en) * 2016-10-17 2017-05-10 中国电子技术标准化研究院 Real-time micro-blog classifier based on multiple search models
CN108090607A (en) * 2017-12-13 2018-05-29 中山大学 A kind of social media user's ascribed characteristics of population Forecasting Methodology based on the fusion of multi-model storehouse
CN108710609A (en) * 2018-05-07 2018-10-26 南京邮电大学 A kind of analysis method of social platform user information based on multi-feature fusion

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
D. VARSHNEYA 等: "Restaurant attribute classification using deep learning", 《2016 IEEE ANNUAL INDIA CONFERENCE》 *
PENNACCHIOTTI M 等: "Democrats, republicans and starbucks afficionados", 《PROCEEDINGS OF THE 17TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING》 *
刘云 等: "面向社会化媒体用户评论行为的属性推断", 《计算机学报》 *
刘晶: "融合多特征聚类的垃圾微博检测研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111984872A (en) * 2020-09-09 2020-11-24 北京中科研究院 Multi-modal information social media popularity prediction method based on iterative optimization strategy
CN111984872B (en) * 2020-09-09 2021-03-16 北京中科研究院 Multi-modal information social media popularity prediction method based on iterative optimization strategy

Also Published As

Publication number Publication date
CN111125486B (en) 2022-11-25

Similar Documents

Publication Publication Date Title
CN109635291B (en) Recommendation method for fusing scoring information and article content based on collaborative training
US20180173788A1 (en) System And Method For Providing Inclusion-Based Electronically Stored Information Item Classification Suggestions With The Aid Of A Digital Computer
CN110162591B (en) Entity alignment method and system for digital education resources
CN108268600B (en) AI-based unstructured data management method and device
CN108733798A (en) A kind of personalized recommendation method of knowledge based collection of illustrative plates
CN103150333B (en) Opinion leader identification method in microblog media
CN110990683B (en) Microblog rumor integrated identification method and device based on region and emotional characteristics
CN109933664A (en) A kind of fine granularity mood analysis improved method based on emotion word insertion
CN107705066A (en) Information input method and electronic equipment during a kind of commodity storage
CN104572982B (en) Personalized recommendation method and system based on problem guiding
CN104778283B (en) A kind of user's occupational classification method and system based on microblogging
CN110956210A (en) Semi-supervised network water force identification method and system based on AP clustering
CN111144831B (en) Accurate selection screening system and method suitable for recruitment
CN111460145A (en) Learning resource recommendation method, device and storage medium
CN106202391A (en) The automatic classification method of a kind of user's community and device
CN106776695A (en) The method for realizing the automatic identification of secretarial document value
CN117474507A (en) Intelligent recruitment matching method and system based on big data application technology
CN111898038A (en) Social media false news detection method based on man-machine cooperation
CN111125486B (en) Microblog user attribute analysis method based on multiple features
CN106919647B (en) Clustering-based network structure similarity recommendation method
CN109508557A (en) A kind of file path keyword recognition method of association user privacy
Sheeba et al. A fuzzy logic based on sentiment classification
JP5929532B2 (en) Event detection apparatus, event detection method, and event detection program
CN112565903A (en) Video recommendation method and device, server and storage medium
CN109254993B (en) Text-based character data analysis method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant