CN111125486A - Microblog user attribute analysis method based on multiple features - Google Patents
Microblog user attribute analysis method based on multiple features Download PDFInfo
- Publication number
- CN111125486A CN111125486A CN201911340531.4A CN201911340531A CN111125486A CN 111125486 A CN111125486 A CN 111125486A CN 201911340531 A CN201911340531 A CN 201911340531A CN 111125486 A CN111125486 A CN 111125486A
- Authority
- CN
- China
- Prior art keywords
- user
- microblog
- text
- representing
- attribute analysis
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000004458 analytical method Methods 0.000 title claims abstract description 43
- 238000004364 calculation method Methods 0.000 claims abstract description 26
- 238000000034 method Methods 0.000 claims abstract description 17
- 239000013598 vector Substances 0.000 claims abstract description 14
- 239000002131 composite material Substances 0.000 claims abstract description 8
- 230000004927 fusion Effects 0.000 claims abstract description 6
- 238000004140 cleaning Methods 0.000 claims abstract description 5
- 238000005516 engineering process Methods 0.000 claims abstract description 5
- 230000009193 crawling Effects 0.000 claims abstract description 3
- 230000006399 behavior Effects 0.000 claims description 14
- 238000007477 logistic regression Methods 0.000 claims description 12
- 238000003066 decision tree Methods 0.000 claims description 10
- 238000012706 support-vector machine Methods 0.000 claims description 10
- 238000012549 training Methods 0.000 claims description 7
- 230000003287 optical effect Effects 0.000 claims description 6
- 238000010276 construction Methods 0.000 claims description 5
- 230000000694 effects Effects 0.000 claims description 5
- 230000011218 segmentation Effects 0.000 claims description 5
- 238000012545 processing Methods 0.000 claims description 4
- 238000007405 data analysis Methods 0.000 abstract description 2
- 238000000605 extraction Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000007781 pre-processing Methods 0.000 description 2
- 230000002159 abnormal effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 230000002650 habitual effect Effects 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/355—Class or cluster creation or modification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9536—Search customisation based on social or collaborative filtering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/01—Social networking
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Business, Economics & Management (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- Marketing (AREA)
- General Business, Economics & Management (AREA)
- Human Resources & Organizations (AREA)
- General Health & Medical Sciences (AREA)
- Economics (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to a microblog user attribute analysis method based on multiple features, and belongs to the technical field of intelligent media calculation and big data analysis. The method comprises the following steps: s1, crawling user microblog blog information by using crawler software, cleaning and marking; s2, constructing word vectors of microblog blog information through a word2vec model, and obtaining user microblog text characteristics on the basis of the word vectors according to a combination strategy of ensemble learning; s3, constructing a multi-feature system for microblog attribute analysis from user microblog data, and constructing a composite feature according with the user attribute analysis through basic features; s4, fusing the multiple base classifiers by adopting a Stacking model fusion technology, constructing a microblog user attribute analysis model, and inputting data to be detected to obtain a final microblog user attribute analysis result. According to the invention, the accuracy of attribute classification of the microblog users is improved, and technical support is provided for merchants to provide more efficient personalized recommendation for the users.
Description
Technical Field
The invention belongs to the technical field of intelligent media calculation and big data analysis, and relates to a microblog user attribute analysis method based on multiple features.
Background
With the increasing popularity of online social media, network information becomes voluminous and confusing. By means of computer technology, the method deeply understands the basic information of individuals and groups, excavates social psychology and behavior modes, quickly and accurately provides personalized and multi-aspect decision support, assists in solving the actual social problems, and becomes an important subject of common attention in the academic and industrial fields at present. Deep understanding of user information and user behavior is one of the core contents therein. Since personal attribute data often involves privacy problems, users often choose to hide their personal information in ways of not filling in or filling in false information, and so on, so that the basic information related to the users often cannot be directly acquired. User attribute analysis may address such issues.
At present, the research work in the aspect of user attribute analysis at home and abroad usually starts from three aspects of supervised learning, semi-supervised learning and unsupervised learning. Compared with semi-supervised learning data sparseness and unsupervised learning, the accuracy is lower, and the combination of novel composite features is more suitable for analysis of user attributes under the condition that a multi-feature system is constructed by supervised learning. And the characteristics considered by the existing microblog user attribute analysis method are not perfect enough, so that the accuracy of the obtained analysis result is not high.
Disclosure of Invention
In view of this, the invention aims to provide a multi-feature-based microblog user attribute analysis method, which aims to improve the accuracy of microblog user attribute classification so that merchants provide more efficient personalized recommendation for users.
In order to achieve the purpose, the invention provides the following technical scheme:
a microblog user attribute analysis method based on multiple features specifically comprises the following steps:
s1: crawling user microblog blog information by using crawler software, cleaning and marking;
s2: constructing a word vector of microblog blog information through a word2vec model, and obtaining user microblog text characteristics according to an ensemble learning combination strategy on the basis;
s3: constructing a multi-feature system for microblog attribute analysis from user microblog data, and constructing a composite feature which accords with the user attribute analysis through a basic feature;
s4: fusing the plurality of base classifiers by adopting a Stacking model fusion technology, constructing a microblog user attribute analysis model, and inputting data to be detected to obtain a final microblog user attribute analysis result.
Further, in step S2, the specific construction step of the user microblog text feature includes:
s21: carrying out word segmentation processing on the sample by using a Jieba word segmentation tool, removing stop words, merging microblogs of each user to obtain a user blog collectionmiA set of microblogs with user ID i is represented,a set of micro-blogs representing a single user,wta word representing a single microblog;
s22: training microblogs of microblog users through a Skip-Gram model to obtain 300-dimensional word vectors in the microblogs, and calculating the microblog vector of each user, wherein the calculation formula is as follows:
wherein u isiDenotes a user with ID i, K denotes a user uiNumber of microblog words, WveckA word vector representing a kth word;
s23: the method comprises the steps of taking a Stacking model as a combined strategy of integrated learning, taking a Support Vector Machine (SVM), a decision tree (decision tree), Logistic regression (Logistic), an optical gradient elevator (LightGBM) and extreme gradient elevator (XGboost) as primary classifiers, combining prediction results of the primary classifiers by Logistic regression (Logistic) serving as a two-layer classifier, and finally obtaining microblog text characteristics of a user.
Further, in step S3, the constructed composite feature includes: the user activity, the user microblog time distribution and the user behavior habits;
the user liveness feature fuseractive(ui) The calculation formula of (a) is as follows:
wherein u isiIndicating a user with an ID of i, fsum(ui) Representing user uiTotal number of microblogs, ftranspond(ui) Representing user uiNumber of microblogs forwarded, ftime(ui) Representing user uiThe time interval between the first microblog and the last microblog;
wherein,indicating a user with ID i located in time period j,representing user uiThe number of microblogs issued at time j,representing user uiThe number of microblogs forwarded at time j;
the user behavior habit fuserBehavior(ui) The calculation formula of (a) is as follows:
fuserBehavior(ui)=ftextBehavior(ui)+ftextSource(ui)+finforIntegrity(ui)
wherein f istextBehavior(ui) Representing user uiText behavior habit of ftextSource(ui) Representing user uiInformation of the source of the blog article, finforIntegrity(ui) Representing user uiThe basic information integrity of (1).
Further, the text behavior habit of the user is obtained by calculating the proportion of the emoticons and the pictures in the microblog according to the following specific calculation formula:
wherein f isemoticons(textn) Representing the number of expression symbols in the nth microblog, fpicture(textn) Representing the number of pictures in the nth microblog, N representing the user uiThe number of microblogs.
Further, the user's blog source information is based on the male's idiomatic text source fmSource(ui) And a source of female's customary text ffSource(ui) The calculation formula is as follows: f. oftextSource(ui)=fmSource(ui)-ffSource(ui)。
Further, the male idiomatic text source fmSource(ui) The calculation formula of (a) is as follows:
wherein N represents a user uiNumber of microblogs, fmSourceNum(textj) The nth microblog source is a male text source, and sourceNum is the total number of the text sources.
Further, the calculation formula of the female familiar text source is as follows:
wherein N represents a user uiNumber of microblogs, ffSourceNum(textj) The nth microblog source is a female text source, and sourceNum is the total number of the text sources.
Further, the user information integrity specifically includes: f. ofinforIntegrityThe basic information integrity of the user is represented, the basic information comprises a nickname, a registered place, gender, birthday, brief introduction, education information and head portrait information of the user, and the calculation formula is as follows:
wherein f isnameIndicating whether there is a nickname, flocationIndicating whether there is a registered location, fbirthdayIndicating whether there is birthday information, fintroductionIndicating whether there is a profile, feducationIndicating whether there is educational information, fheadPhotoIndicating whether there is avatar information, and m indicates the total number of basic information.
Further, in step S4, the fusing the multiple base classifiers by using a Stacking model fusion technique to construct the microblog user attribute analysis model specifically includes: the microblog user attribute analysis model is constructed by using a Support Vector Machine (SVM), a decision tree (decision tree), a Logistic regression (Logistic), an optical gradient elevator (LightGBM) and an extreme gradient elevator (XGboost) as primary classifiers, and the Logistic regression (Logistic) as a two-layer classifier.
The invention has the beneficial effects that: according to the invention, various characteristics of the microblog of the user are fully considered, and various personalized data of the microblog user are obtained by training according to the established microblog user attribute analysis model, so that the accuracy of attribute classification of the microblog user is improved, and technical support is provided for merchants to provide more efficient personalized recommendation for the user.
Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objectives and other advantages of the invention may be realized and attained by the means of the instrumentalities and combinations particularly pointed out hereinafter.
Drawings
For the purposes of promoting a better understanding of the objects, aspects and advantages of the invention, reference will now be made to the following detailed description taken in conjunction with the accompanying drawings in which:
FIG. 1 is a general flow chart of microblog user attribute analysis according to the present invention;
FIG. 2 is a flow chart of construction and extraction of microblog user attribute analysis text features in the invention;
FIG. 3 is a flow chart of microblog user attribute analysis non-text feature construction and extraction in the present invention;
FIG. 4 is a flowchart of microblog user attribute analysis model construction in the invention.
Detailed Description
The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention in a schematic way, and the features in the following embodiments and examples may be combined with each other without conflict.
Referring to fig. 1 to 4, fig. 1 is a general flowchart of a microblog user attribute analysis method according to a preferred embodiment of the present invention, where the microblog user attribute analysis method according to this embodiment may be executed as a computer program or as a plug-in executed in other programs, and the specific execution process includes:
step S1: and (5) preprocessing the data.
The data preprocessing comprises two stages of data cleaning and marking. And in the cleaning stage, processing abnormal values and null values in the data to ensure the integrity of the sample data. And in the marking stage, manual marking is carried out on the acquired data according to priori knowledge, and the data are divided into a male type and a female type, wherein 0 represents the male type, and 1 represents the female type.
Step S2: and constructing a word vector of the microblog blog information through word2vec, and obtaining microblog text characteristics according to an ensemble learning combination strategy on the basis. The method specifically comprises the following steps:
step S3: constructing a multi-feature system for microblog attribute analysis from user microblog data, and constructing a composite feature which accords with the user attribute analysis through a basic feature;
step S4: and fusing the plurality of base classifiers by adopting a Stacking model fusion technology to obtain a final microblog user attribute analysis result.
Specifically, as shown in fig. 2, step S2 specifically includes the following steps:
step S21: performing word segmentation processing on each microblog of the users to stop words, and merging the microblogs of each user on the basis to obtain a user blog collectionmiA set of microblogs with user ID i is represented,a set of micro-blogs representing a single user,wta word representing a single microblog.
Step S22: the method comprises the steps of training a crawled microblog user microblog through a Skip-Gram model to obtain a 300-dimensional word vector in the microblog, and calculating a microblog vector of each user through a formula, wherein the formula is as follows:
wherein u isiDenotes a user with ID i, K denotes a user uiThe number of words in the microblog words,Wvecka word vector representing a kth word;
step S23: a stacking model is adopted as an integrated learning strategy, a Support Vector Machine (SVM), a decision tree (decision tree), Logistic regression (Logistic), an optical gradient elevator (LightGBM) and extreme gradient elevator (XGboost) are used as base classifiers, and the Logistic regression (Logistic) is used as a meta classifier to construct a microblog user attribute analysis model.
And step S24, inputting the training set into the model for fitting, and performing parameter tuning by a grid search method to obtain an optimal model.
And step S25, inputting the training set into the model obtained in step S24 to obtain text features.
As shown in fig. 3, step S3 specifically includes the following steps:
step S31: a multi-feature system for microblog attribute analysis is constructed from user microblog data, and comprises text features, time features, statistical features, numerical features and content features, as shown in table 1:
TABLE 1 Multi-feature systems Table
Step S32: and constructing three composite characteristics of user activity, microblog time distribution and user behavior habits on the basis of the extracted multi-characteristic system.
Specifically, the calculation formula of the user activity characteristic is as follows:
wherein u isiIndicating a user with an ID of i, fsum(ui) Representing user uiTotal number of microblogs, ftranspond(ui) Representing user uiNumber of microblogs forwarded, ftime(ui) Representing user uiThe time interval of the first microblog and the last microblog is released.
The calculation formula of the microblog time distribution characteristics of the user is as follows:
wherein,represents a user whose ID is i in a time period j (0. ltoreq. j.ltoreq.23),the ID is represented as the number of microblogs issued by the user at the moment j,and the number of microblogs forwarded by the user with the ID i at the moment j is shown.
The user behavior habit characteristics are as follows: according to the user text behavior habit ftextBehaviorUser Bowen Source information ftextSourceAnd user information integrity finforIntegrityThe calculation is carried out, and the specific calculation formula is as follows:
fuserBehavior(ui)=ftextBehavior+ftextSource+finforIntegrity
the user text behavior habit is obtained by calculating the proportion of emoticons and pictures in the user microblog, and the calculation formula is as follows:
wherein f istextBehavior(ui) Representing user uiHair habit of uiDenotes a user with ID i, N denotes a user uiNumber of microblogs, femoticons(textj) Representing the number of expression symbols in the nth microblog, fpicture(textn) And representing the number of pictures in the nth microblog.
User Bowen source information: according to the male's habitual text source fmSource(ui) And a source of female's customary text ffSource(ui) Calculating to obtain the user Bowen source information, wherein the calculation formula is as follows:
ftextSource(ui)=fmSource(ui)-ffSource(ui)
male idiomatic text sources: obtaining a male familiar text source f for a male text source and the number of the text sources according to the microblog source of the usermSource(ui) The formula is as follows:
wherein f ismSourceNum(textj) The nth microblog source is a male text source, and sourceNum is the total number of the text sources.
Female familiar text sources: obtaining a female conventional text source f according to the number of the female text sources and the microblog sources of the userfSource(ui) The formula is as follows:
wherein f isfSourceNum(textj) The nth microblog source is a female text source, and sourceNum is the total number of the text sources.
The user information integrity specifically includes: f. ofinforIntegrityThe basic information integrity of the user is represented, the basic information comprises a nickname, a location, a gender, a birthday, a brief introduction, education information and head portrait information of the user, and the specific formula is as follows:
wherein f isnameIndicating whether there is a nickname, flocationIndicating whether there is a registered location, fbirthdayIndicating whether there is birthday information, fintroductionIndicating whether there is a profile, feducationIndicating whether there is educational information, fheadPhotoIndicating whether there is head portrait informationAnd m denotes the total number of basic information.
As shown in fig. 4, step S4 includes:
step S41: the Stacking method is adopted as a combined strategy of ensemble learning to construct a rumor recognition model, a Support Vector Machine (SVM), a decision tree (decision tree), a Logistic regression (Logistic), an optical gradient elevator (LightGBM) and an extreme gradient elevator (XGboost) are used as a primary classifier of the Stacking model, and the Logistic regression (Logistic) model is used as a two-layer classifier.
Step S42: inputting the training set into a model for fitting, and performing parameter tuning by a grid search method to obtain an optimal model.
Step S43: and inputting the test set into the fitting model to obtain a final user attribute analysis result.
Finally, the above embodiments are only intended to illustrate the technical solutions of the present invention and not to limit the present invention, and although the present invention has been described in detail with reference to the preferred embodiments, it will be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions, and all of them should be covered by the claims of the present invention.
Claims (9)
1. A microblog user attribute analysis method based on multiple features is characterized by specifically comprising the following steps of:
s1: crawling user microblog blog information by using crawler software, cleaning and marking;
s2: constructing a word vector of microblog blog information through a word2vec model, and obtaining user microblog text characteristics according to an ensemble learning combination strategy on the basis;
s3: constructing a multi-feature system for microblog attribute analysis from user microblog data, and constructing a composite feature which accords with the user attribute analysis through a basic feature;
s4: fusing the plurality of base classifiers by adopting a Stacking model fusion technology, constructing a microblog user attribute analysis model, and inputting data to be detected to obtain a final microblog user attribute analysis result.
2. The method for analyzing the attributes of the microblog users based on the multiple features according to claim 1, wherein in the step S2, the construction of the microblog text features of the users specifically comprises the following steps:
s21: carrying out word segmentation processing on the sample by using a Jieba word segmentation tool, removing stop words, merging microblogs of each user to obtain a user blog collectionmiA set of microblogs with user ID i is represented,a set of micro-blogs representing a single user,wta word representing a single microblog;
s22: training microblogs of microblog users through a Skip-Gram model to obtain 300-dimensional word vectors in the microblogs, and calculating the microblog vector of each user, wherein the calculation formula is as follows:
wherein u isiDenotes a user with ID i, K denotes a user uiNumber of microblog words, WveckA word vector representing a kth word;
s23: and taking a Stacking model as a combined strategy of ensemble learning, taking a support vector machine, a decision tree, logistic regression, an optical gradient elevator and extreme gradient elevator as primary classifiers, and obtaining a prediction result by combining the logistic regression as a two-layer classifier to finally obtain the microblog text characteristics of the user.
3. The method for analyzing the attributes of the microblog users based on the multiple features according to claim 1, wherein in the step S3, the constructed composite features comprise: the user activity, the user microblog time distribution and the user behavior habits;
the user liveness feature fuseractive(ui) The calculation formula of (a) is as follows:
wherein u isiIndicating a user with an ID of i, fsum(ui) Representing user uiTotal number of microblogs, ftranspond(ui) Representing user uiNumber of microblogs forwarded, ftime(ui) Representing user uiThe time interval between the first microblog and the last microblog;
wherein,indicating a user with ID i located in time period j,representing user uiThe number of microblogs issued at time j,representing user uiThe number of microblogs forwarded at time j;
the user behavior habit fuserBehavior(ui) The calculation formula of (a) is as follows:
fuserBehavior(ui)=ftextBehavior(ui)+ftextSource(ui)+finforIntegrity(ui)
wherein f istextBehavior(ui) Representing user uiText behavior habit of ftextSource(ui) Representing user uiInformation of the source of the blog article, finforIntegrity(ui) Representing user uiThe basic information integrity of (1).
4. The multi-feature-based microblog user attribute analysis method according to claim 3, wherein the text behavior habit of the user is calculated according to the proportion of emoticons and pictures in the microblog of the user, and the specific calculation formula is as follows:
wherein f isemoticons(textn) Representing the number of expression symbols in the nth microblog, fpicture(textn) Representing the number of pictures in the nth microblog, N representing the user uiThe number of microblogs.
5. The method for analyzing the attributes of microblog users based on multiple features of claim 3, wherein the user Bowen source information is according to a male familiar text source fmSource(ui) And a source of female's customary text ffSource(ui) The calculation formula is as follows: f. oftextSource(ui)=fmSource(ui)-ffSource(ui)。
6. The method according to claim 5, wherein the male idiomatic text source f is a source of multi-feature-based microblog user attributesmSource(ui) The calculation formula of (a) is as follows:
wherein N represents a user uiNumber of microblogs, fmSourceNum(textj) The nth microblog source is a male text source, and sourceNum is the total number of the text sources.
7. The method for analyzing the attributes of the microblog users based on the multi-feature of claim 5, wherein the calculation formula of the female familiar text source is as follows:
wherein N represents a user uiNumber of microblogs, ffSourceNum(textj) The nth microblog source is a female text source, and sourceNum is the total number of the text sources.
8. The multi-feature-based microblog user attribute analysis method according to claim 3, wherein the user information integrity degree specifically includes: f. ofinforIntegrityThe basic information integrity of the user is represented, the basic information comprises a nickname, a registered place, gender, birthday, brief introduction, education information and head portrait information of the user, and the calculation formula is as follows:
wherein f isnameIndicating whether there is a nickname, flocationIndicating whether there is a registered location, fbirthdayIndicating whether there is birthday information, fintroductionIndicating whether there is a profile, feducationIndicating whether there is educational information, fheadPhotoIndicating whether there is avatar information, and m indicates the total number of basic information.
9. The method for analyzing attributes of microblog users based on multiple features according to claim 1, wherein in the step S4, fusing the multiple base classifiers by using a Stacking model fusion technique to construct the microblog user attribute analysis model specifically comprises: and constructing a microblog user attribute analysis model by using a support vector machine, a decision tree, logistic regression, an optical gradient elevator and extreme gradient elevator as primary classifiers and using the logistic regression as a secondary classifier.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911340531.4A CN111125486B (en) | 2019-12-23 | 2019-12-23 | Microblog user attribute analysis method based on multiple features |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911340531.4A CN111125486B (en) | 2019-12-23 | 2019-12-23 | Microblog user attribute analysis method based on multiple features |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111125486A true CN111125486A (en) | 2020-05-08 |
CN111125486B CN111125486B (en) | 2022-11-25 |
Family
ID=70501405
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911340531.4A Active CN111125486B (en) | 2019-12-23 | 2019-12-23 | Microblog user attribute analysis method based on multiple features |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111125486B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111984872A (en) * | 2020-09-09 | 2020-11-24 | 北京中科研究院 | Multi-modal information social media popularity prediction method based on iterative optimization strategy |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160164866A1 (en) * | 2014-12-09 | 2016-06-09 | Duo Security, Inc. | System and method for applying digital fingerprints in multi-factor authentication |
CN106202211A (en) * | 2016-06-27 | 2016-12-07 | 四川大学 | A kind of integrated microblogging rumour recognition methods based on microblogging type |
CN106296422A (en) * | 2016-07-29 | 2017-01-04 | 重庆邮电大学 | A kind of social networks junk user detection method merging many algorithms |
CN106649515A (en) * | 2016-10-17 | 2017-05-10 | 中国电子技术标准化研究院 | Real-time micro-blog classifier based on multiple search models |
CN108090607A (en) * | 2017-12-13 | 2018-05-29 | 中山大学 | A kind of social media user's ascribed characteristics of population Forecasting Methodology based on the fusion of multi-model storehouse |
CN108710609A (en) * | 2018-05-07 | 2018-10-26 | 南京邮电大学 | A kind of analysis method of social platform user information based on multi-feature fusion |
-
2019
- 2019-12-23 CN CN201911340531.4A patent/CN111125486B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160164866A1 (en) * | 2014-12-09 | 2016-06-09 | Duo Security, Inc. | System and method for applying digital fingerprints in multi-factor authentication |
CN106202211A (en) * | 2016-06-27 | 2016-12-07 | 四川大学 | A kind of integrated microblogging rumour recognition methods based on microblogging type |
CN106296422A (en) * | 2016-07-29 | 2017-01-04 | 重庆邮电大学 | A kind of social networks junk user detection method merging many algorithms |
CN106649515A (en) * | 2016-10-17 | 2017-05-10 | 中国电子技术标准化研究院 | Real-time micro-blog classifier based on multiple search models |
CN108090607A (en) * | 2017-12-13 | 2018-05-29 | 中山大学 | A kind of social media user's ascribed characteristics of population Forecasting Methodology based on the fusion of multi-model storehouse |
CN108710609A (en) * | 2018-05-07 | 2018-10-26 | 南京邮电大学 | A kind of analysis method of social platform user information based on multi-feature fusion |
Non-Patent Citations (4)
Title |
---|
D. VARSHNEYA 等: "Restaurant attribute classification using deep learning", 《2016 IEEE ANNUAL INDIA CONFERENCE》 * |
PENNACCHIOTTI M 等: "Democrats, republicans and starbucks afficionados", 《PROCEEDINGS OF THE 17TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING》 * |
刘云 等: "面向社会化媒体用户评论行为的属性推断", 《计算机学报》 * |
刘晶: "融合多特征聚类的垃圾微博检测研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111984872A (en) * | 2020-09-09 | 2020-11-24 | 北京中科研究院 | Multi-modal information social media popularity prediction method based on iterative optimization strategy |
CN111984872B (en) * | 2020-09-09 | 2021-03-16 | 北京中科研究院 | Multi-modal information social media popularity prediction method based on iterative optimization strategy |
Also Published As
Publication number | Publication date |
---|---|
CN111125486B (en) | 2022-11-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109635291B (en) | Recommendation method for fusing scoring information and article content based on collaborative training | |
US20180173788A1 (en) | System And Method For Providing Inclusion-Based Electronically Stored Information Item Classification Suggestions With The Aid Of A Digital Computer | |
CN110162591B (en) | Entity alignment method and system for digital education resources | |
CN108268600B (en) | AI-based unstructured data management method and device | |
CN108733798A (en) | A kind of personalized recommendation method of knowledge based collection of illustrative plates | |
CN103150333B (en) | Opinion leader identification method in microblog media | |
CN110990683B (en) | Microblog rumor integrated identification method and device based on region and emotional characteristics | |
CN109933664A (en) | A kind of fine granularity mood analysis improved method based on emotion word insertion | |
CN107705066A (en) | Information input method and electronic equipment during a kind of commodity storage | |
CN104572982B (en) | Personalized recommendation method and system based on problem guiding | |
CN104778283B (en) | A kind of user's occupational classification method and system based on microblogging | |
CN110956210A (en) | Semi-supervised network water force identification method and system based on AP clustering | |
CN111144831B (en) | Accurate selection screening system and method suitable for recruitment | |
CN111460145A (en) | Learning resource recommendation method, device and storage medium | |
CN106202391A (en) | The automatic classification method of a kind of user's community and device | |
CN106776695A (en) | The method for realizing the automatic identification of secretarial document value | |
CN117474507A (en) | Intelligent recruitment matching method and system based on big data application technology | |
CN111898038A (en) | Social media false news detection method based on man-machine cooperation | |
CN111125486B (en) | Microblog user attribute analysis method based on multiple features | |
CN106919647B (en) | Clustering-based network structure similarity recommendation method | |
CN109508557A (en) | A kind of file path keyword recognition method of association user privacy | |
Sheeba et al. | A fuzzy logic based on sentiment classification | |
JP5929532B2 (en) | Event detection apparatus, event detection method, and event detection program | |
CN112565903A (en) | Video recommendation method and device, server and storage medium | |
CN109254993B (en) | Text-based character data analysis method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |