CN103617230A - Method and system for advertisement recommendation based microblog - Google Patents

Method and system for advertisement recommendation based microblog Download PDF

Info

Publication number
CN103617230A
CN103617230A CN201310608335.7A CN201310608335A CN103617230A CN 103617230 A CN103617230 A CN 103617230A CN 201310608335 A CN201310608335 A CN 201310608335A CN 103617230 A CN103617230 A CN 103617230A
Authority
CN
China
Prior art keywords
lexical item
microblogging
data
feature
dictionary
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201310608335.7A
Other languages
Chinese (zh)
Other versions
CN103617230B (en
Inventor
章昉
刘明君
赵中英
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Institute of Advanced Technology of CAS
Original Assignee
Shenzhen Institute of Advanced Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Institute of Advanced Technology of CAS filed Critical Shenzhen Institute of Advanced Technology of CAS
Priority to CN201310608335.7A priority Critical patent/CN103617230B/en
Publication of CN103617230A publication Critical patent/CN103617230A/en
Application granted granted Critical
Publication of CN103617230B publication Critical patent/CN103617230B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes

Abstract

The invention belongs to the field of data mining and provides a method and system for advertisement recommendation based a microblog. The method comprises the steps that microblog data are read; the microblog data are initialized and a microblog text lexical item set is obtained; stop words of the microblog text lexical item set are deleted and a microblog text original feature lexical item set is obtained; mapping is conducted on the microblog text original feature lexical item set and a feature lexical item dictionary, whether lexical items in the microblog text original feature lexical item set exist in the feature lexical item dictionary or not is judged, and the tf-idf values of the appearing lexical items are calculated and serve as the feature values of the lexical items; whether the lexical items of the feature lexical item dictionary exist in the microblog text original feature lexical item set or not is judged and the feature values of the lexical items which do not appear are marked to be zero; feature vectors of the feature values obtained through calculation are automatically classified to classifications divided in advance; according to an automatic classification result, advertisements are recommended to a user. The advertisements recommended by the method and system are accurate and the effect is good.

Description

A kind of advertisement recommend method and system based on microblogging
Technical field
The invention belongs to Data Mining, relate in particular to a kind of advertisement recommend method and system based on microblogging.
Background technology
Along with at home popular of the social network sites such as Sina's microblogging, Tengxun's microblogging, the social medias such as microblogging not only become the platform that netizen issues, shares, diffuses information, and have accumulated extensive netizen's behavioral data.In May, 2012, the deputy general manager Lu Yi of microblogging division department of Sina points out, the microblogging registered user of Sina surpasses 300,000,000, and average issue every day of user surpasses 100,000,000 microblogging contents.The radix of microblog users is large, data volume is large, if existing mass data can be analyzed and excavate to microblogging operation system, can to the interest of microblog users, judge comparatively accurately according to analysis result, according to the interest of microblog users, it is carried out to advertisement putting, the advertisement microblog users being pushed will make microblog users, businessman and the tripartite of microblogging operator all be benefited.
Existing microblogging advertisement recommend method mainly utilizes label in individual subscriber data or user's searching record to carry out interest judgement to microblog users, so it is pushed to user may interested advertisement.Due to a lot of users' personal information the inside, not contain the label that label or user fill in when creating personal information inaccurate, therefore by user tag, it carried out to advertisement and recommend to reach good effect.And by the searching record of microblog users being judged to user's interest has certain limitation, only can represent this user's current needs and can not judge comparatively accurately its interest.
Summary of the invention
The embodiment of the present invention provides a kind of advertisement recommend method based on microblogging, is intended to solve existing method accuracy when digging user information low, thereby causes the bad problem of advertisement recommendation effect.
The embodiment of the present invention is achieved in that a kind of advertisement recommend method based on microblogging, and described method comprises the steps:
Read user's microblogging data;
The microblogging data that initialization is read, to obtain the set of microblogging text lexical item, the microblogging data that described initialization is read comprise special symbol, non-Chinese character, the participle of removing in the microblogging data that read;
Delete the stop words of described microblogging text lexical item set, to obtain the set of microblogging text primitive character lexical item;
The set of described microblogging text primitive character lexical item and the feature lexical item dictionary generating are in advance shone upon, judge in the feature lexical item dictionary whether lexical item in the set of described microblogging text primitive character lexical item generate described in appearing in advance, and the word frequency-reverse file frequency tf-idf value of the lexical item in the described microblogging text primitive character lexical item set in the feature lexical item dictionary generating in advance described in calculating now, using as described in appear at described in lexical item in described microblogging text primitive character lexical item set in the feature lexical item dictionary of generation in advance in the eigenwert of microblogging;
Whether the lexical item of the feature lexical item dictionary generating in advance described in judgement appears in the set of described microblogging text primitive character lexical item, and the eigenwert that does not appear at the lexical item of the described feature lexical item dictionary generating in advance in the set of described microblogging text primitive character lexical item is labeled as to 0;
Use the disaggregated model obtain in advance by user's microblogging data automatic classification in the classification of dividing in advance;
The result of automatic classification of take is foundation, to the user's recommended advertisements that reads microblogging data.
Another object of the embodiment of the present invention is to provide a kind of advertisement commending system based on microblogging, and described system comprises:
The first data are read in module, for reading user's microblogging data;
The first data initialization module, the microblogging data that read for initialization, to obtain the set of microblogging text lexical item, the microblogging data that described initialization is read comprise special symbol, non-Chinese character, the participle of removing in the microblogging data that read;
First Characteristic extraction module, for deleting the stop words of described microblogging text lexical item set, to obtain the set of microblogging text primitive character lexical item;
First eigenvector module, for the set of described microblogging text primitive character lexical item and the feature lexical item dictionary generating are in advance shone upon, judge in the feature lexical item dictionary whether lexical item in the set of described microblogging text primitive character lexical item generate described in appearing in advance, and the tf-idf value of the lexical item in the described microblogging text primitive character lexical item set in the feature lexical item dictionary generating in advance described in calculating now, using as described in lexical item in described microblogging text primitive character lexical item set in the feature lexical item dictionary that generates in advance described in appearing in the eigenwert of microblogging, and whether appear at the set of described microblogging text primitive character lexical item for the lexical item of the feature lexical item dictionary that generates in advance described in judging, and the eigenwert that does not appear at the lexical item of the described feature lexical item dictionary generating in advance in the set of described microblogging text primitive character lexical item is labeled as to 0,
Sort module, for use the disaggregated model that obtains in advance by user's microblogging data automatic classification to the classification of dividing in advance;
Recommending module is foundation for take the result of automatic classification, to the user's recommended advertisements that reads microblogging data.
In the embodiment of the present invention, the information comprising than user tag due to the microblogging data of user issue has more real-time, interest preference that more can representative of consumer, and the judged result that therefore the microblogging data by analysis user obtain is more accurate, thereby the advertisement of recommending is also more accurate, and effect is also better.
Accompanying drawing explanation
Fig. 1 is the process flow diagram of a kind of advertisement recommend method based on microblogging of providing of first embodiment of the invention;
Fig. 2 is a kind of advertisement commending system structural drawing based on microblogging that second embodiment of the invention provides;
Fig. 3 is the advertisement commending system structural drawing of the another kind that provides of second embodiment of the invention based on microblogging.
Embodiment
In order to make object of the present invention, technical scheme and advantage clearer, below in conjunction with drawings and Examples, the present invention is further elaborated.Should be appreciated that specific embodiment described herein, only in order to explain the present invention, is not intended to limit the present invention.
The embodiment of the present invention, by the microblogging data of user's issue are excavated, classified, is judged this user's interest preference, and then is recommended corresponding advertisement to this user.
It is a kind of that the embodiment of the present invention provides: advertisement recommend method and system based on microblogging.
Described method comprises: the microblogging data that read user;
The microblogging data that initialization is read, to obtain the set of microblogging text lexical item, the microblogging data that described initialization is read comprise special symbol, non-Chinese character, the participle of removing in the microblogging data that read;
Delete the stop words of described microblogging text lexical item set, to obtain the set of microblogging text primitive character lexical item;
The set of described microblogging text primitive character lexical item and the feature lexical item dictionary generating are in advance shone upon, judge in the feature lexical item dictionary whether lexical item in the set of described microblogging text primitive character lexical item generate described in appearing in advance, and the word frequency-reverse file frequency tf-idf value of the lexical item in the described microblogging text primitive character lexical item set in the feature lexical item dictionary generating in advance described in calculating now, using as described in appear at described in lexical item in described microblogging text primitive character lexical item set in the feature lexical item dictionary of generation in advance in the eigenwert of microblogging;
Whether the lexical item of the feature lexical item dictionary generating in advance described in judgement appears in the set of described microblogging text primitive character lexical item, and the eigenwert that does not appear at the lexical item of the described feature lexical item dictionary generating in advance in the set of described microblogging text primitive character lexical item is labeled as to 0;
Use the disaggregated model obtain in advance by user's microblogging data automatic classification in the classification of dividing in advance;
The result of automatic classification of take is foundation, to the user's recommended advertisements that reads microblogging data.
Described system comprises: the first data are read in module, for reading user's microblogging data;
The first data initialization module, the microblogging data that read for initialization, to obtain the set of microblogging text lexical item, the microblogging data that described initialization is read comprise special symbol, non-Chinese character, the participle of removing in the microblogging data that read;
First Characteristic extraction module, for deleting the stop words of described microblogging text lexical item set, to obtain the set of microblogging text primitive character lexical item;
First eigenvector module, for the set of described microblogging text primitive character lexical item and the feature lexical item dictionary generating are in advance shone upon, judge in the feature lexical item dictionary whether lexical item in the set of described microblogging text primitive character lexical item generate described in appearing in advance, and the tf-idf value of the lexical item in the described microblogging text primitive character lexical item set in the feature lexical item dictionary generating in advance described in calculating now, using as described in lexical item in described microblogging text primitive character lexical item set in the feature lexical item dictionary that generates in advance described in appearing in the eigenwert of microblogging, and whether appear at the set of described microblogging text primitive character lexical item for the lexical item of the feature lexical item dictionary that generates in advance described in judging, and the eigenwert that does not appear at the lexical item of the described feature lexical item dictionary generating in advance in the set of described microblogging text primitive character lexical item is labeled as to 0,
Sort module, for use the disaggregated model that obtains in advance by user's microblogging data automatic classification to the classification of dividing in advance;
Recommending module is foundation for take the result of automatic classification, to the user's recommended advertisements that reads microblogging data.
In the embodiment of the present invention, the information comprising than user tag due to the microblogging data of user issue has more real-time, interest preference that more can representative of consumer, and the judged result that therefore the microblogging data by analysis user obtain is more accurate, thereby the advertisement of recommending is also more accurate, and effect is also better.
For technical solutions according to the invention are described, below by specific embodiment, describe.
embodiment mono-:
Fig. 1 shows a kind of advertisement recommend method based on microblogging that first embodiment of the invention provides, and details are as follows:
Step S11, reads user's microblogging data.
In this step, can obtain in advance user's microblogging data, the microblogging data of obtaining are stored in database, need to be to certain user's microblogging data analysis time, then read this user's microblogging data.
Step S12, the microblogging data that initialization is read, to obtain the set of microblogging text lexical item, the microblogging data that described initialization is read comprise special symbol, non-Chinese character, the participle of removing in the microblogging data that read.
In this step, every microblogging data are carried out to initialization process, such as removing the special symbols such as punctuation mark, removing non-Chinese character, participle etc., after initialization process, obtain the set of a microblogging text lexical item.
Step S13, deletes the stop words of described microblogging text lexical item set, to obtain the set of microblogging text primitive character lexical item.
Step S14, the set of described microblogging text primitive character lexical item and the feature lexical item dictionary generating are in advance shone upon, judge in the feature lexical item dictionary whether lexical item in the set of described microblogging text primitive character lexical item generate described in appearing in advance, and word frequency-reverse file frequency (term frequency-inverse document frequency of the lexical item in the described microblogging text primitive character lexical item set in the feature lexical item dictionary generating in advance described in calculating now, tf-idf) value, using as described in lexical item in described microblogging text primitive character lexical item set in the feature lexical item dictionary that generates in advance described in appearing in the eigenwert of microblogging.
In this step, the microblogging text primitive character lexical item set of every microblogging is shone upon to feature lexical item dictionary, if the tf-idf value that the lexical item of microblogging text primitive character lexical item set at feature lexical item dictionary, is calculated this lexical item is so the eigenwert in this microblogging as this lexical item.
Step S15, whether the lexical item of the feature lexical item dictionary generating in advance described in judgement appears in the set of described microblogging text primitive character lexical item, and the eigenwert that does not appear at the lexical item of the described feature lexical item dictionary generating in advance in the set of described microblogging text primitive character lexical item is labeled as to 0.
In this step, the lexical item of microblogging text primitive character lexical item set is not at feature lexical item dictionary, and this lexical item is left in the basket, if the lexical item in feature lexical item dictionary does not appear in the set of microblogging text primitive character lexical item, the eigenwert of this lexical item is 0; The microblogging text of final every microblogging is transformed into the proper vector that a dimension is 5000.
Step S16, use the disaggregated model obtain in advance by user's microblogging data automatic classification in the classification of dividing in advance.
In this step, can divide in advance according to the actual requirements plurality of classes, such as, divide in advance 12 kinds, have respectively sport category, healthy class, educational, GT grand touring, scientific and technological class, automotive-type, game class, beauty treatment, hairdressing and body shaping class, cuisines class, clothes footwear boots bag class, entertainment class, other.
Wherein, sport category comprises the contents such as competitive sports, physical culture newpapers and periodicals, sports star;
Wherein, healthy class comprises the contents such as healthy general knowledge, medicine, physical condition;
Wherein, the educational training organizations such as ,Xin navigation channel, New Orient that comprise, individual's study condition, learning intent, the content such as go abroad to study;
Wherein, GT grand touring comprises the contents such as sight spot, recreation ground, travel abroad, free walker, hotel;
Wherein, scientific and technological class comprises the contents such as mobile phone, computer, digital product;
Wherein, automotive-type comprises the contents such as automobile, automobile journal;
Wherein, game class comprises the contents such as mobile phone games, web game, online game;
Wherein, beauty treatment, hairdressing and body shaping class comprises the contents such as skin care item, cosmetics, manicure, slim, washing product;
Wherein, cuisines class comprises the contents such as food, good-for-nothing, recipe;
Wherein, entertainment class comprises the contents such as amusement circles, concert, modern drama, exhibition;
Wherein, other comprise the contents such as ownness, individual emotion, social view, life view.
Step S17, the result of automatic classification of take is foundation, to the user's recommended advertisements that reads microblogging data.
In this step, if the result of automatic classification is that user's microblogging data are included into certain class, to user, recommend the advertisement corresponding with this classification.The advertisement here comprises news, music, film, microblogging etc.
In the embodiment of the present invention, by the microblogging data of user's issue are excavated, classified, judge this user's interest preference, and then recommend corresponding advertisement to this user.The information that the microblogging data of issuing due to user comprise than user tag has more real-time, interest preference that more can representative of consumer, therefore the judged result that the microblogging data by analysis user obtain is more accurate, thereby the advertisement of recommending is also more accurate, and effect is also better.
As one embodiment of the present invention, at step S16, use the disaggregated model obtaining in advance that user's microblogging data automatic classification is comprised the steps: to the step in the classification of dividing in advance before
Steps A, read training microblogging data.
In this step, read a plurality of users' microblogging data as the microblogging data of training, to improve the accuracy of follow-up excavation as far as possible.
Step B, the classification by the described training microblogging data handmarking who reads for dividing in advance.
In this step, several makers for the class in the classification of dividing in advance, when the classification of every microblogging data of mark, use the principle that the minority is subordinate to the majority by every the microblogging data markers reading.
The training microblogging data that step C, initialization are read, to obtain the set of microblogging text lexical item, the training microblogging data that described initialization is read comprise in the special symbol of removing in the training microblogging data that read, non-Chinese character, participle.
Step D, delete the stop words of described microblogging text lexical item set, to obtain the set of microblogging text primitive character lexical item.
Step e, generating feature lexical item dictionary.
In this step, the step of generating feature lexical item dictionary specifically comprises: the mutual information value of calculating each lexical item in the set of microblogging text primitive character lexical item; Choose mutual information value rank in N the lexical item of front N the lexical item as feature lexical item dictionary, described N is integer, N is greater than 0.For example select 5000 lexical items that mutual information value is the highest as the lexical item of feature lexical item dictionary, the feature lexical item dictionary of generation can be arranged according to the height of mutual information value.
Step F, the lexical item set of described microblogging text primitive character and described feature lexical item dictionary are shone upon, judge whether the lexical item in the set of described microblogging text primitive character lexical item appears in described feature lexical item dictionary, and calculate the tf-idf value of the lexical item in the described microblogging text primitive character lexical item set in present described feature lexical item dictionary, using as described in appear at lexical item in the described microblogging text primitive character lexical item set in described feature lexical item dictionary in the eigenwert of microblogging.
Step G, judge whether the lexical item of described feature lexical item dictionary appears in the set of described microblogging text primitive character lexical item, and the eigenwert that does not appear at the lexical item of the described feature lexical item dictionary in the set of described microblogging text primitive character lexical item is labeled as to 0.
Step H, the proper vector that adopts all eigenwerts that default Algorithm for Training calculates to form, to obtain disaggregated model.
In this step, train all microblogging data characteristic of correspondence vector matrixs, during follow-up certain user's of excavation microblogging data, can directly use the result after training.
Wherein, default algorithm comprises following any algorithm: support vector machines, Naive Bayes Classification Algorithm, neural network, K close on sorting algorithm, genetic algorithm.
In the present embodiment, by analyzing the microblogging data of a large number of users, generating feature lexical item dictionary, this feature lexical item dictionary provides a normative reference for excavate certain user's interest preference later.
As one embodiment of the present invention, step S17, the result of automatic classification of take is foundation, specifically comprises: the shared number percent of every class microblogging in the microblogging of counting user to the step that reads user's recommended advertisements of microblogging data; The shared number percent of every class microblogging of statistics is mated with the label of user in microblogging data, and the shared number percent of the classification that the match is successful is double; To the user who reads microblogging data, recommend rank in the advertisement of M the classification of front M, described M is integer, and M is greater than 0.
In the present embodiment, to user's historical microblogging carry out classification count every class microblogging percentage and and this subscriber data in label mate, if label contains certain class, such microblogging percentage is double so, finally select M the classification that number percent is the highest, for example, select three classifications and recommend classification as this user's advertisement.Preferably, after a period of time, can recalculate the advertisement recommendation classification that show that this user is up-to-date.
embodiment bis-:
Fig. 2 shows the structure of a kind of advertisement commending system based on microblogging that second embodiment of the invention provides, and for convenience of explanation, only shows the part relevant to the embodiment of the present invention.
The advertisement commending system being somebody's turn to do based on microblogging can be for passing through the various information processing terminals of wired or wireless network connection server, mobile phone for example, pocket computing machine (Pocket Personal Computer, PPC), palm PC, computing machine, notebook computer, personal digital assistant (Personal Digital Assistant, PDA) etc., it can be the software unit running in these information processing terminals, the unit that hardware cell or software and hardware combine, also can be used as independently suspension member is integrated in these information processing terminals or runs in the application system of these information processing terminals, wherein:
The first data are read in module 201, for reading user's microblogging data.
The first data initialization module 202, the microblogging data that read for initialization, to obtain the set of microblogging text lexical item, the microblogging data that described initialization is read comprise in the special symbol of removing in the microblogging data that read, non-Chinese character, participle.
First Characteristic extraction module 203, for deleting the stop words of described microblogging text lexical item set, to obtain the set of microblogging text primitive character lexical item.
First eigenvector module 204, for the set of described microblogging text primitive character lexical item and the feature lexical item dictionary generating are in advance shone upon, judge in the feature lexical item dictionary whether lexical item in the set of described microblogging text primitive character lexical item generate described in appearing in advance, and the tf-idf value of the lexical item in the described microblogging text primitive character lexical item set in the feature lexical item dictionary generating in advance described in calculating now, using as described in lexical item in described microblogging text primitive character lexical item set in the feature lexical item dictionary that generates in advance described in appearing in the eigenwert of microblogging.And whether appear at the set of described microblogging text primitive character lexical item for the lexical item of the feature lexical item dictionary that generates in advance described in judging, and the eigenwert that does not appear at the lexical item of the described feature lexical item dictionary generating in advance in the set of described microblogging text primitive character lexical item is labeled as to 0.
Wherein, through the calculating of first eigenvector module 204, the microblogging data of every microblogging change the proper vector that a latitude is 5000 into the most at last.
Sort module 205, for use the disaggregated model that obtains in advance by user's microblogging data automatic classification to the classification of dividing in advance.
Wherein, the classification of dividing in advance can be 12 classes, specifically, as shown in step S16, repeats no more here.
Recommending module 206 is foundation for take the result of automatic classification, to the user's recommended advertisements that reads microblogging data.
Wherein, the advertisement here comprises the contents such as news, music, film, microblogging.
In the embodiment of the present invention, by the microblogging data to reading, excavate, classification under dividing, and recommend the advertisement relevant to dividing classification to user.Because microblogging data can reflect user's interest preference in time, the judged result that therefore the microblogging data by analysis user obtain is more accurate, thereby the advertisement of recommending is also more accurate, and effect is also better.
Fig. 3 shows another structure of the advertisement commending system based on microblogging, and as another preferred embodiment of the present invention, the described advertisement commending system based on microblogging also comprises:
The second data are read in module 301, for reading training microblogging data.
Wherein, the microblogging data that the microblogging data that read are a plurality of users.
Manual sort's module 302, for the classification for dividing in advance by the described training microblogging data handmarking who reads.
The second data initialization module 303, the training microblogging data that read for initialization, to obtain the set of microblogging text lexical item, the training microblogging data that described initialization is read comprise in the special symbol of removing in the training microblogging data that read, non-Chinese character, participle.
Second Characteristic extraction module 304, for deleting the stop words of described microblogging text lexical item set, to obtain the set of microblogging text primitive character lexical item.
Feature lexical item dictionary generation module 305, for generating feature lexical item dictionary.
Wherein, feature lexical item dictionary generation module 305 comprises:
Mutual information value computing module, for calculating the mutual information value of each lexical item of microblogging text primitive character lexical item set.
Feature lexical item dictionary lexical item is selected module, for choose mutual information value rank in N the lexical item of front N the lexical item as feature lexical item dictionary, described N is integer, N is greater than 0.
Second Characteristic vectorization module 306, for the lexical item set of described microblogging text primitive character and described feature lexical item dictionary are shone upon, judge whether the lexical item in the set of described microblogging text primitive character lexical item appears in described feature lexical item dictionary, and calculate the tf-idf value of the lexical item in the described microblogging text primitive character lexical item set in present described feature lexical item dictionary, using as described in appear at lexical item in the described microblogging text primitive character lexical item set in described feature lexical item dictionary in the eigenwert of microblogging.And for judging whether the lexical item of described feature lexical item dictionary appears at the set of described microblogging text primitive character lexical item, and the eigenwert that does not appear at the lexical item of the described feature lexical item dictionary in the set of described microblogging text primitive character lexical item is labeled as to 0.
Training module 307, for the proper vector that adopts all eigenwerts that default Algorithm for Training calculates to form, to obtain disaggregated model.
Wherein, default algorithm comprises following any algorithm:
Support vector machines, Naive Bayes Classification Algorithm, neural network, K close on sorting algorithm, genetic algorithm.
In the present embodiment, by analyzing the microblogging data of a large number of users, generating feature lexical item dictionary, this feature lexical item dictionary provides a normative reference for excavate certain user's interest preference later.
As one embodiment of the present invention, described recommending module 206 comprises:
Data statistics module, for the shared number percent of the every class microblogging of microblogging of counting user.
Data Matching module, for the shared number percent of every class microblogging of statistics is mated with the label of user in microblogging data, and the shared number percent of the classification that the match is successful is double.
Advertisement recommending module, recommends rank in the advertisement of M the classification of front M for the user to reading microblogging data, and described M is integer, and M is greater than 0.
In the present embodiment, only choose rank and recommend client in the advertisement of a front M classification, make advertisement putting more accurate not increasing client and browse on the basis of pressure.
In embodiments of the present invention, by the microblogging data of user's issue are excavated, classified, and at the label information of microblogging, judge this user's interest preference in conjunction with user, and then recommend corresponding advertisement to this user.The information that the microblogging data of issuing due to user comprise than user tag has more real-time, interest preference that more can representative of consumer, therefore the judged result that microblogging data and the label information by analysis user obtains is than only analyzing tags information is more accurate, thereby the advertisement of recommending is also more accurate, and effect is also better.
The foregoing is only preferred embodiment of the present invention, not in order to limit the present invention, all any modifications of doing within the spirit and principles in the present invention, be equal to and replace and improvement etc., within all should being included in protection scope of the present invention.

Claims (10)

1. the advertisement recommend method based on microblogging, is characterized in that, described method comprises the steps:
Read user's microblogging data;
The microblogging data that initialization is read, to obtain the set of microblogging text lexical item, the microblogging data that described initialization is read comprise special symbol, non-Chinese character, the participle of removing in the microblogging data that read;
Delete the stop words of described microblogging text lexical item set, to obtain the set of microblogging text primitive character lexical item;
The set of described microblogging text primitive character lexical item and the feature lexical item dictionary generating are in advance shone upon, judge in the feature lexical item dictionary whether lexical item in the set of described microblogging text primitive character lexical item generate described in appearing in advance, and the word frequency-reverse file frequency tf-idf value of the lexical item in the described microblogging text primitive character lexical item set in the feature lexical item dictionary generating in advance described in calculating now, using as described in appear at described in lexical item in described microblogging text primitive character lexical item set in the feature lexical item dictionary of generation in advance in the eigenwert of microblogging;
Whether the lexical item of the feature lexical item dictionary generating in advance described in judgement appears in the set of described microblogging text primitive character lexical item, and the eigenwert that does not appear at the lexical item of the described feature lexical item dictionary generating in advance in the set of described microblogging text primitive character lexical item is labeled as to 0;
Use the disaggregated model obtain in advance by user's microblogging data automatic classification in the classification of dividing in advance;
The result of automatic classification of take is foundation, to the user's recommended advertisements that reads microblogging data.
2. the method for claim 1, is characterized in that, the disaggregated model obtaining in advance in described use comprised the steps: user's microblogging data automatic classification before in the classification of dividing in advance
Read training microblogging;
Classification by the described training microblogging data handmarking who reads for dividing in advance;
The training microblogging data that initialization is read, to obtain the set of microblogging text lexical item, the training microblogging data that described initialization is read comprise special symbol, non-Chinese character, the participle of removing in the training microblogging data that read;
Delete the stop words of described microblogging text lexical item set, to obtain the set of microblogging text primitive character lexical item;
Generating feature lexical item dictionary;
The lexical item set of described microblogging text primitive character and described feature lexical item dictionary are shone upon, judge whether the lexical item in the set of described microblogging text primitive character lexical item appears in described feature lexical item dictionary, and calculate the tf-idf value of the lexical item in the described microblogging text primitive character lexical item set in present described feature lexical item dictionary, using as described in appear at lexical item in the described microblogging text primitive character lexical item set in described feature lexical item dictionary in the eigenwert of microblogging;
Whether the lexical item that judges described feature lexical item dictionary appears in the set of described microblogging text primitive character lexical item, and the eigenwert that does not appear at the lexical item of the described feature lexical item dictionary in the set of described microblogging text primitive character lexical item is labeled as to 0;
The proper vector that all eigenwerts that adopt default Algorithm for Training to calculate form, to obtain disaggregated model.
3. method as claimed in claim 2, is characterized in that, the step of described generating feature lexical item dictionary specifically comprises:
Calculate the mutual information value of each lexical item in the set of microblogging text primitive character lexical item;
Choose mutual information value rank in N the lexical item of front N the lexical item as feature lexical item dictionary, described N is integer, N is greater than 0.
4. method as claimed in claim 2, is characterized in that, described default algorithm comprises following any algorithm:
Support vector machines, Naive Bayes Classification Algorithm, neural network, K close on sorting algorithm, genetic algorithm.
5. the method for claim 1, is characterized in that, described result of take automatic classification is foundation, to the step that reads user's recommended advertisements of microblogging data, specifically comprises:
The shared number percent of every class microblogging in the microblogging of counting user;
The shared number percent of every class microblogging of statistics is mated with the label of user in microblogging data, and the shared number percent of the classification that the match is successful is double;
To the user who reads microblogging data, recommend rank in the advertisement of M the classification of front M, described M is integer, and M is greater than 0.
6. the advertisement commending system based on microblogging, is characterized in that, described system comprises:
The first data are read in module, for reading user's microblogging data;
The first data initialization module, the microblogging data that read for initialization, to obtain the set of microblogging text lexical item, the microblogging data that described initialization is read comprise special symbol, non-Chinese character, the participle of removing in the microblogging data that read;
First Characteristic extraction module, for deleting the stop words of described microblogging text lexical item set, to obtain the set of microblogging text primitive character lexical item;
First eigenvector module, for the set of described microblogging text primitive character lexical item and the feature lexical item dictionary generating are in advance shone upon, judge in the feature lexical item dictionary whether lexical item in the set of described microblogging text primitive character lexical item generate described in appearing in advance, and the tf-idf value of the lexical item in the described microblogging text primitive character lexical item set in the feature lexical item dictionary generating in advance described in calculating now, using as described in lexical item in described microblogging text primitive character lexical item set in the feature lexical item dictionary that generates in advance described in appearing in the eigenwert of microblogging, and whether appear at the set of described microblogging text primitive character lexical item for the lexical item of the feature lexical item dictionary that generates in advance described in judging, and the eigenwert that does not appear at the lexical item of the described feature lexical item dictionary generating in advance in the set of described microblogging text primitive character lexical item is labeled as to 0,
Sort module, for use the disaggregated model that obtains in advance by user's microblogging data automatic classification to the classification of dividing in advance;
Recommending module is foundation for take the result of automatic classification, to the user's recommended advertisements that reads microblogging data.
7. system as claimed in claim 6, is characterized in that, described system also comprises:
The second data are read in module, for reading training microblogging data;
Manual sort's module, for the classification for dividing in advance by the described training microblogging data handmarking who reads;
The second data initialization module, the training microblogging data that read for initialization, to obtain the set of microblogging text lexical item, the training microblogging data that described initialization is read comprise special symbol, non-Chinese character, the participle of removing in the training microblogging data that read;
Second Characteristic extraction module, for deleting the stop words of described microblogging text lexical item set, to obtain the set of microblogging text primitive character lexical item;
Feature lexical item dictionary generation module, for generating feature lexical item dictionary;
Second Characteristic vectorization module, for the lexical item set of described microblogging text primitive character and described feature lexical item dictionary are shone upon, judge whether the lexical item in the set of described microblogging text primitive character lexical item appears in described feature lexical item dictionary, and calculate the tf-idf value of the lexical item in the described microblogging text primitive character lexical item set in present described feature lexical item dictionary, using as described in appear at lexical item in the described microblogging text primitive character lexical item set in described feature lexical item dictionary in the eigenwert of microblogging; And for judging whether the lexical item of described feature lexical item dictionary appears at the set of described microblogging text primitive character lexical item, and the eigenwert that does not appear at the lexical item of the described feature lexical item dictionary in the set of described microblogging text primitive character lexical item is labeled as to 0;
Training module, for the proper vector that adopts all eigenwerts that default Algorithm for Training calculates to form, to obtain disaggregated model.
8. system as claimed in claim 7, is characterized in that, described feature lexical item dictionary generation module comprises:
Mutual information value computing module, for calculating the mutual information value of each lexical item of microblogging text primitive character lexical item set;
Feature lexical item dictionary lexical item is selected module, for choose mutual information value rank in N the lexical item of front N the lexical item as feature lexical item dictionary, described N is integer, N is greater than 0.
9. system as claimed in claim 7, is characterized in that, described default algorithm comprises following any algorithm:
Support vector machines, Naive Bayes Classification Algorithm, neural network, K close on sorting algorithm, genetic algorithm.
10. system as claimed in claim 6, is characterized in that, described recommending module comprises:
Data statistics module, for the shared number percent of the every class microblogging of microblogging of counting user;
Data Matching module, for the shared number percent of every class microblogging of statistics is mated with the label of user in microblogging data, and the shared number percent of the classification that the match is successful is double;
Advertisement recommending module, recommends rank in the advertisement of M the classification of front M for the user to reading microblogging data, and described M is integer, and M is greater than 0.
CN201310608335.7A 2013-11-26 2013-11-26 Method and system for advertisement recommendation based microblog Active CN103617230B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310608335.7A CN103617230B (en) 2013-11-26 2013-11-26 Method and system for advertisement recommendation based microblog

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310608335.7A CN103617230B (en) 2013-11-26 2013-11-26 Method and system for advertisement recommendation based microblog

Publications (2)

Publication Number Publication Date
CN103617230A true CN103617230A (en) 2014-03-05
CN103617230B CN103617230B (en) 2017-02-15

Family

ID=50167933

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310608335.7A Active CN103617230B (en) 2013-11-26 2013-11-26 Method and system for advertisement recommendation based microblog

Country Status (1)

Country Link
CN (1) CN103617230B (en)

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104851026A (en) * 2015-05-21 2015-08-19 上海宾谷网络科技有限公司 Big data based bid native advertisement reward system for positioning user in real time, and method
CN104915386A (en) * 2015-05-25 2015-09-16 中国科学院自动化研究所 Short text clustering method based on deep semantic feature learning
CN104992347A (en) * 2015-06-17 2015-10-21 北京奇艺世纪科技有限公司 Video matching advertisement method and device
CN105389345A (en) * 2015-10-26 2016-03-09 天津大学 Short message text content classification method
CN105975497A (en) * 2016-04-27 2016-09-28 清华大学 Automatic microblog topic recommendation method and device
CN106339402A (en) * 2015-07-16 2017-01-18 腾讯科技(深圳)有限公司 Method, device and system for pushing recommended contents
CN106886579A (en) * 2017-01-23 2017-06-23 北京航空航天大学 Real-time streaming textual hierarchy monitoring method and device
CN107086925A (en) * 2017-03-07 2017-08-22 珠海城市职业技术学院 A kind of internet traffic big data analysis method based on deep learning
CN107169799A (en) * 2017-05-17 2017-09-15 微梦创科网络科技(中国)有限公司 In a kind of primary information flow generation based on social networks, throws advertisement implementation method and system
CN107590195A (en) * 2017-08-14 2018-01-16 百度在线网络技术(北京)有限公司 Textual classification model training method, file classification method and its device
WO2018023656A1 (en) * 2016-08-05 2018-02-08 汤隆初 Method for adjusting advertisement push according to usage conditions of other users, and push system
WO2018023658A1 (en) * 2016-08-05 2018-02-08 汤隆初 Method for pushing advertisement according to followed public account, and push system
WO2018023657A1 (en) * 2016-08-05 2018-02-08 汤隆初 Method for adjusting wechat public account-based advertisement push technique, and push system
CN108399194A (en) * 2018-01-29 2018-08-14 中国科学院信息工程研究所 A kind of Cyberthreat information generation method and system
CN109145280A (en) * 2017-06-15 2019-01-04 北京京东尚科信息技术有限公司 The method and apparatus of information push
CN109214893A (en) * 2018-08-31 2019-01-15 深圳春沐源控股有限公司 Method of Commodity Recommendation, recommender system and computer installation
CN110781303A (en) * 2019-10-28 2020-02-11 佰聆数据股份有限公司 Short text classification method and system
CN111369298A (en) * 2020-03-09 2020-07-03 成都欧魅时尚科技有限责任公司 Method for automatically adjusting advertisement budget based on Internet hotspot event

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101634983A (en) * 2008-07-21 2010-01-27 华为技术有限公司 Method and device for text classification
US8027977B2 (en) * 2007-06-20 2011-09-27 Microsoft Corporation Recommending content using discriminatively trained document similarity
CN103324708A (en) * 2013-06-18 2013-09-25 哈尔滨工程大学 Method of transfer learning from long text to short text
CN103389981A (en) * 2012-05-08 2013-11-13 腾讯科技(深圳)有限公司 Network label automatic identification method and system thereof

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8027977B2 (en) * 2007-06-20 2011-09-27 Microsoft Corporation Recommending content using discriminatively trained document similarity
CN101634983A (en) * 2008-07-21 2010-01-27 华为技术有限公司 Method and device for text classification
CN103389981A (en) * 2012-05-08 2013-11-13 腾讯科技(深圳)有限公司 Network label automatic identification method and system thereof
CN103324708A (en) * 2013-06-18 2013-09-25 哈尔滨工程大学 Method of transfer learning from long text to short text

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
麦艺华: ""面向中文微博的社会网络分析及应用"", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104851026B (en) * 2015-05-21 2018-07-17 上海宾谷网络科技有限公司 Position the primary advertisement reward system and method for bidding of user in real time based on big data
CN104851026A (en) * 2015-05-21 2015-08-19 上海宾谷网络科技有限公司 Big data based bid native advertisement reward system for positioning user in real time, and method
CN104915386A (en) * 2015-05-25 2015-09-16 中国科学院自动化研究所 Short text clustering method based on deep semantic feature learning
CN104915386B (en) * 2015-05-25 2018-04-27 中国科学院自动化研究所 A kind of short text clustering method based on deep semantic feature learning
CN104992347A (en) * 2015-06-17 2015-10-21 北京奇艺世纪科技有限公司 Video matching advertisement method and device
CN104992347B (en) * 2015-06-17 2018-12-14 北京奇艺世纪科技有限公司 A kind of method and device of video matching advertisement
CN106339402A (en) * 2015-07-16 2017-01-18 腾讯科技(深圳)有限公司 Method, device and system for pushing recommended contents
US10885142B2 (en) 2015-07-16 2021-01-05 Tencent Technology (Shenzhen) Company Limited Recommended content pushing method, apparatus, terminal, server, and system
CN105389345A (en) * 2015-10-26 2016-03-09 天津大学 Short message text content classification method
CN105975497A (en) * 2016-04-27 2016-09-28 清华大学 Automatic microblog topic recommendation method and device
WO2018023656A1 (en) * 2016-08-05 2018-02-08 汤隆初 Method for adjusting advertisement push according to usage conditions of other users, and push system
WO2018023658A1 (en) * 2016-08-05 2018-02-08 汤隆初 Method for pushing advertisement according to followed public account, and push system
WO2018023657A1 (en) * 2016-08-05 2018-02-08 汤隆初 Method for adjusting wechat public account-based advertisement push technique, and push system
CN106886579A (en) * 2017-01-23 2017-06-23 北京航空航天大学 Real-time streaming textual hierarchy monitoring method and device
CN106886579B (en) * 2017-01-23 2020-01-14 北京航空航天大学 Real-time streaming text grading monitoring method and device
CN107086925B (en) * 2017-03-07 2020-04-07 珠海城市职业技术学院 Deep learning-based internet traffic big data analysis method
CN107086925A (en) * 2017-03-07 2017-08-22 珠海城市职业技术学院 A kind of internet traffic big data analysis method based on deep learning
CN107169799A (en) * 2017-05-17 2017-09-15 微梦创科网络科技(中国)有限公司 In a kind of primary information flow generation based on social networks, throws advertisement implementation method and system
CN107169799B (en) * 2017-05-17 2020-10-27 微梦创科网络科技(中国)有限公司 Method and system for realizing advertisement delivery instead of native information stream based on social relationship
CN109145280A (en) * 2017-06-15 2019-01-04 北京京东尚科信息技术有限公司 The method and apparatus of information push
CN107590195A (en) * 2017-08-14 2018-01-16 百度在线网络技术(北京)有限公司 Textual classification model training method, file classification method and its device
CN108399194A (en) * 2018-01-29 2018-08-14 中国科学院信息工程研究所 A kind of Cyberthreat information generation method and system
CN109214893A (en) * 2018-08-31 2019-01-15 深圳春沐源控股有限公司 Method of Commodity Recommendation, recommender system and computer installation
CN110781303A (en) * 2019-10-28 2020-02-11 佰聆数据股份有限公司 Short text classification method and system
CN111369298A (en) * 2020-03-09 2020-07-03 成都欧魅时尚科技有限责任公司 Method for automatically adjusting advertisement budget based on Internet hotspot event

Also Published As

Publication number Publication date
CN103617230B (en) 2017-02-15

Similar Documents

Publication Publication Date Title
CN103617230B (en) Method and system for advertisement recommendation based microblog
US11734725B2 (en) Information sending method, apparatus and system, and computer-readable storage medium
CN103678335B (en) The method of method, apparatus and the commodity navigation of commodity sign label
CN111784455A (en) Article recommendation method and recommendation equipment
US20160132601A1 (en) Hybrid Explanations In Collaborative Filter Based Recommendation System
CN106503015A (en) A kind of method for building user's portrait
CN106294500B (en) Content item pushing method, device and system
CN106105096A (en) System and method for continuous social communication
CN102576438A (en) Method and apparatus for executing a recommendation
CN103365867A (en) Method and device for emotion analysis of user evaluation
Zhang et al. Multimodal marketing intent analysis for effective targeted advertising
CN111898031A (en) Method and device for obtaining user portrait
CN103207914A (en) Preference vector generation method and preference vector generation system based on user feedback evaluation
JP5754854B2 (en) Contributor analysis apparatus, program and method for analyzing poster profile information
CN104050243B (en) It is a kind of to search for the network search method combined with social activity and its system
CN106776859A (en) Mobile solution App commending systems based on user preference
CN115244547A (en) Automatically and intelligently exploring design spaces
CN113934941A (en) User recommendation system and method based on multi-dimensional information
CN108984711A (en) A kind of personalized APP recommended method based on layering insertion
KR20130053448A (en) Search device, search method, search program, and computer-readable memory medium for recording search program
CN116823410B (en) Data processing method, object processing method, recommending method and computing device
CN106383857A (en) Information processing method and electronic equipment
US20230351473A1 (en) Apparatus and method for providing user's interior style analysis model on basis of sns text
Singgalen Implementing Rapid Application Development (RAD) for Statistical Analysis of Tourism and Travel Vlog Content
CN111026957B (en) Recommendation system and method based on multidimensional similarity

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20140305

Assignee: Dongguan Shengnuolin Sports Products Co.,Ltd.

Assignor: SHENZHEN INSTITUTES OF ADVANCED TECHNOLOGY CHINESE ACADEMY OF SCIENCES

Contract record no.: X2023980037877

Denomination of invention: A Weibo based advertising recommendation method and system

Granted publication date: 20170215

License type: Common License

Record date: 20230712

EE01 Entry into force of recordation of patent licensing contract
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20140305

Assignee: Shenzhen Huayun Xingchuang Cultural Technology Co.,Ltd.

Assignor: SHENZHEN INSTITUTES OF ADVANCED TECHNOLOGY CHINESE ACADEMY OF SCIENCES

Contract record no.: X2023980043804

Denomination of invention: A Weibo based advertising recommendation method and system

Granted publication date: 20170215

License type: Common License

Record date: 20231018

Application publication date: 20140305

Assignee: Shenzhen Xingfei Software Technology Co.,Ltd.

Assignor: SHENZHEN INSTITUTES OF ADVANCED TECHNOLOGY CHINESE ACADEMY OF SCIENCES

Contract record no.: X2023980043566

Denomination of invention: A Weibo based advertising recommendation method and system

Granted publication date: 20170215

License type: Common License

Record date: 20231016

EE01 Entry into force of recordation of patent licensing contract