CN103617230B - Method and system for advertisement recommendation based microblog - Google Patents

Method and system for advertisement recommendation based microblog Download PDF

Info

Publication number
CN103617230B
CN103617230B CN201310608335.7A CN201310608335A CN103617230B CN 103617230 B CN103617230 B CN 103617230B CN 201310608335 A CN201310608335 A CN 201310608335A CN 103617230 B CN103617230 B CN 103617230B
Authority
CN
China
Prior art keywords
lexical item
feature
microblogging
microblog
dictionary
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201310608335.7A
Other languages
Chinese (zh)
Other versions
CN103617230A (en
Inventor
章昉
刘明君
赵中英
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Institute of Advanced Technology of CAS
Original Assignee
Shenzhen Institute of Advanced Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Institute of Advanced Technology of CAS filed Critical Shenzhen Institute of Advanced Technology of CAS
Priority to CN201310608335.7A priority Critical patent/CN103617230B/en
Publication of CN103617230A publication Critical patent/CN103617230A/en
Application granted granted Critical
Publication of CN103617230B publication Critical patent/CN103617230B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes

Abstract

The invention belongs to the field of data mining and provides a method and system for advertisement recommendation based a microblog. The method comprises the steps that microblog data are read; the microblog data are initialized and a microblog text lexical item set is obtained; stop words of the microblog text lexical item set are deleted and a microblog text original feature lexical item set is obtained; mapping is conducted on the microblog text original feature lexical item set and a feature lexical item dictionary, whether lexical items in the microblog text original feature lexical item set exist in the feature lexical item dictionary or not is judged, and the tf-idf values of the appearing lexical items are calculated and serve as the feature values of the lexical items; whether the lexical items of the feature lexical item dictionary exist in the microblog text original feature lexical item set or not is judged and the feature values of the lexical items which do not appear are marked to be zero; feature vectors of the feature values obtained through calculation are automatically classified to classifications divided in advance; according to an automatic classification result, advertisements are recommended to a user. The advertisements recommended by the method and system are accurate and the effect is good.

Description

A kind of method and system for advertisement recommendation based on microblogging
Technical field
The invention belongs to Data Mining, more particularly, to a kind of method and system for advertisement recommendation based on microblogging.
Background technology
With the social network sites such as Sina weibo, Tengxun's microblogging at home popular, the social media such as microblogging not only becomes Netizen's issue, platform that is shared, propagating information, and have accumulated the behavioral data of extensive netizen.In May, 2012, Sina Microblogging division department vice general manager Lu Yi points out, more than 300,000,000, user averagely issues more than 100,000,000 Sina weibo register user daily Bar content of microblog.The radix of microblog users is big, and data volume is big, if microblogging operation system can be analyzed and excavate existing magnanimity number According to more accurately being judged to the interest of microblog users according to analysis result, the interest according to microblog users is to it Carry out advertisement putting, then advertisement microblog users being pushed will make microblog users, businessman and microblogging operator tripartite all benefited.
Existing microblogging advertisement recommends method mainly to utilize the label in individual subscriber data or the search using user Record carries out interest judgement to microblog users, and then it is pushed with the advertisement that user may be interested.Due to a lot of users Inside people's data and not contain the label or user label filled in when creating personal information inaccurate, therefore marked by user Label it is carried out advertisement to be recommended to reach good effect.And by the search record of microblog users judging user's Interest has certain limitation, is only capable of representing being currently needed for of this user and its interest accurately can not be sentenced Disconnected.
Content of the invention
Embodiments provide a kind of advertisement based on microblogging and recommend method it is intended to solve existing method in excavation During user profile, accuracy is low, thus leading to the bad problem of advertisement recommendation effect.
The embodiment of the present invention is achieved in that method is recommended in a kind of advertisement based on microblogging, and methods described includes following Step:
Read the microblog data of user;
The microblog data that initialization is read, to obtain microblog text lexical item set, the microblog data that described initialization is read Including the special symbol removing in the microblog data reading, non-Chinese character, participle;
Delete the stop words of described microblog text lexical item set, to obtain microblogging text primitive character lexical item set;
Described microblogging text primitive character lexical item set is mapped with the feature lexical item dictionary previously generating, is judged institute State the lexical item in microblogging text primitive character lexical item set whether occur in described in the feature lexical item dictionary that previously generates, and count Calculate lexical item in the described microblogging text primitive character lexical item set in the now described feature lexical item dictionary previously generating Word frequency-reverse document-frequency tf-idf value, using as in the feature lexical item dictionary previously generating described in described occurring in described in Lexical item in microblogging text primitive character lexical item set is in the eigenvalue of microblogging;
Whether the lexical item of the feature lexical item dictionary previously generating described in judgement occurs in described microblogging text primitive character word In set, and the feature lexical item previously generating described in not appearing in described microblogging text primitive character lexical item set The eigenvalue of the lexical item of dictionary is labeled as 0;
Using the disaggregated model being previously obtained, the microblog data of user is categorized in the classification dividing in advance automatically;
With the result of automatic classification as foundation, to the user's recommended advertisements reading microblog data.
The another object of the embodiment of the present invention is to provide a kind of advertisement commending system based on microblogging, described system bag Include:
First data reads in module, for reading the microblog data of user;
First data initialization module, the microblog data reading for initialization, to obtain microblog text lexical item set, institute State the special symbol initializing the microblog data reading and including in the microblog data that removal is read, non-Chinese character, participle;
Fisrt feature extraction module, for deleting the stop words of described microblog text lexical item set, to obtain microblogging text Primitive character lexical item set;
First eigenvector module, for by described microblogging text primitive character lexical item set and the feature previously generating Lexical item dictionary is mapped, and judges whether the lexical item in described microblogging text primitive character lexical item set occurs in described pre- Mr. In the feature lexical item dictionary becoming, and calculate occur in described in described microblogging text in the feature lexical item dictionary that previously generates original The tf-idf value of the lexical item in feature lexical item set, using as in the feature lexical item dictionary previously generating described in described occurring in Lexical item in described microblogging text primitive character lexical item set is in the eigenvalue of microblogging;And for previously generating described in judging Whether the lexical item of feature lexical item dictionary occurs in described microblogging text primitive character lexical item set, and will not appear in described The eigenvalue of the lexical item of feature lexical item dictionary previously generating described in microblogging text primitive character lexical item set is labeled as 0;
Sort module, is divided in advance for being automatically categorized into the microblog data of user using the disaggregated model being previously obtained Classification in;
Recommending module, for the result of automatic classification as foundation, to the user's recommended advertisements reading microblog data.
In the embodiment of the present invention, because the information that the microblog data that user issues comprises than user tag has more in real time Property, more can represent the interest preference of user, the judged result therefore obtaining by the microblog data analyzing user is more accurate, thus The advertisement recommended is also more accurate, and effect is also more preferable.
Brief description
Fig. 1 is the flow chart that method is recommended in a kind of advertisement based on microblogging that first embodiment of the invention provides;
Fig. 2 is a kind of advertisement commending system structure chart based on microblogging that second embodiment of the invention provides;
Fig. 3 is the advertisement commending system structure chart based on microblogging for the another kind that second embodiment of the invention provides.
Specific embodiment
In order that the objects, technical solutions and advantages of the present invention become more apparent, below in conjunction with drawings and Examples, right The present invention is further elaborated.It should be appreciated that specific embodiment described herein is only in order to explain the present invention, and It is not used in the restriction present invention.
The embodiment of the present invention is carried out excavating, is classified by the microblog data that user is issued, and judges the interest of this user Preference, and then recommend corresponding advertisement to this user.
Embodiments provide one kind:Method and system for advertisement recommendation based on microblogging.
Methods described includes:Read the microblog data of user;
The microblog data that initialization is read, to obtain microblog text lexical item set, the microblog data that described initialization is read Including the special symbol removing in the microblog data reading, non-Chinese character, participle;
Delete the stop words of described microblog text lexical item set, to obtain microblogging text primitive character lexical item set;
Described microblogging text primitive character lexical item set is mapped with the feature lexical item dictionary previously generating, is judged institute State the lexical item in microblogging text primitive character lexical item set whether occur in described in the feature lexical item dictionary that previously generates, and count Calculate lexical item in the described microblogging text primitive character lexical item set in the now described feature lexical item dictionary previously generating Word frequency-reverse document-frequency tf-idf value, using as in the feature lexical item dictionary previously generating described in described occurring in described in Lexical item in microblogging text primitive character lexical item set is in the eigenvalue of microblogging;
Whether the lexical item of the feature lexical item dictionary previously generating described in judgement occurs in described microblogging text primitive character word In set, and the feature lexical item previously generating described in not appearing in described microblogging text primitive character lexical item set The eigenvalue of the lexical item of dictionary is labeled as 0;
Using the disaggregated model being previously obtained, the microblog data of user is categorized in the classification dividing in advance automatically;
With the result of automatic classification as foundation, to the user's recommended advertisements reading microblog data.
Described system includes:First data reads in module, for reading the microblog data of user;
First data initialization module, the microblog data reading for initialization, to obtain microblog text lexical item set, institute State the special symbol initializing the microblog data reading and including in the microblog data that removal is read, non-Chinese character, participle;
Fisrt feature extraction module, for deleting the stop words of described microblog text lexical item set, to obtain microblogging text Primitive character lexical item set;
First eigenvector module, for by described microblogging text primitive character lexical item set and the feature previously generating Lexical item dictionary is mapped, and judges whether the lexical item in described microblogging text primitive character lexical item set occurs in described pre- Mr. In the feature lexical item dictionary becoming, and calculate occur in described in described microblogging text in the feature lexical item dictionary that previously generates original The tf-idf value of the lexical item in feature lexical item set, using as in the feature lexical item dictionary previously generating described in described occurring in Lexical item in described microblogging text primitive character lexical item set is in the eigenvalue of microblogging;And for previously generating described in judging Whether the lexical item of feature lexical item dictionary occurs in described microblogging text primitive character lexical item set, and will not appear in described The eigenvalue of the lexical item of feature lexical item dictionary previously generating described in microblogging text primitive character lexical item set is labeled as 0;
Sort module, is divided in advance for being automatically categorized into the microblog data of user using the disaggregated model being previously obtained Classification in;
Recommending module, for the result of automatic classification as foundation, to the user's recommended advertisements reading microblog data.
In the embodiment of the present invention, because the information that the microblog data that user issues comprises than user tag has more in real time Property, more can represent the interest preference of user, the judged result therefore obtaining by the microblog data analyzing user is more accurate, thus The advertisement recommended is also more accurate, and effect is also more preferable.
In order to technical solutions according to the invention are described, to illustrate below by specific embodiment.
Embodiment one:
Fig. 1 shows that method is recommended in a kind of advertisement based on microblogging that first embodiment of the invention provides, and details are as follows:
Step S11, reads the microblog data of user.
The microblog data of user in this step, can be obtained in advance, the microblog data of acquisition is stored in data base, need When the microblog data of certain user being analyzed, then the microblog data reading this user.
Step S12, the microblog data that initialization is read, to obtain microblog text lexical item set, described initialization is read Microblog data include remove read microblog data in special symbol, non-Chinese character, participle.
In this step, initialization process is carried out to every microblog data, such as remove special symbol, the removals such as punctuation mark Non- Chinese character, participle etc., obtain a microblog text lexical item set after initialization process.
Step S13, deletes the stop words of described microblog text lexical item set, to obtain microblogging text primitive character lexical item collection Close.
Step S14, described microblogging text primitive character lexical item set is reflected with the feature lexical item dictionary previously generating Penetrate, judge the lexical item in described microblogging text primitive character lexical item set whether occur in described in the feature lexical item word that previously generates In allusion quotation, and calculate occur in described in described microblogging text primitive character lexical item set in the feature lexical item dictionary that previously generates Lexical item word frequency-reverse document-frequency(Term frequency-inverse document frequency, tf-idf) Value, using as the described microblogging text primitive character lexical item set in the feature lexical item dictionary previously generating described in described occurring in In lexical item microblogging eigenvalue.
In this step, the microblogging text primitive character lexical item set of every microblogging is mapped to feature lexical item dictionary, If the lexical item of microblogging text primitive character lexical item set is in feature lexical item dictionary, then the tf-idf value calculating this lexical item is made For eigenvalue in this microblogging for this lexical item.
Step S15, it is former whether the lexical item of the feature lexical item dictionary previously generating described in judgement occurs in described microblogging text In beginning feature lexical item set, and previously generate described in not appearing in described microblogging text primitive character lexical item set The eigenvalue of the lexical item of feature lexical item dictionary is labeled as 0.
In this step, not in feature lexical item dictionary, this lexical item is ignored the lexical item of microblogging text primitive character lexical item set, If the lexical item in feature lexical item dictionary does not appear in microblogging text primitive character lexical item set, the eigenvalue of this lexical item is 0;Finally the microblogging text of every microblogging is transformed into the characteristic vector that dimension is 5000.
The microblog data of user is categorized into the class dividing in advance using the disaggregated model being previously obtained by step S16 automatically In not.
In this step, plurality of classes can be divided according to the actual requirements in advance, such as, divide 12 kinds of classifications in advance, have respectively Sport category, healthy class, educational, GT grand touring, scientific and technological class, automotive-type, game class, beauty treatment, hairdressing and body shaping class, cuisines class, clothing footwear Boots bag class, entertainment class, other.
Wherein, sport category includes the contents such as competitive sports, physical culture newpapers and periodicals, sports star;
Wherein, healthy class includes the contents such as healthy general knowledge, medicine, physical condition;
Wherein, the training organization such as educational inclusion New Orient, new navigation channel, the study condition of individual, learning intent, go abroad and stay Etc. content;
Wherein, GT grand touring includes the contents such as sight spot, recreation ground, travel abroad, free walker, hotel;
Wherein, scientific and technological class includes the contents such as mobile phone, computer, digital product;
Wherein, automotive-type includes the contents such as automobile, automobile journal;
Wherein, game class includes the contents such as mobile phone games, web game, online game;
Wherein, beauty treatment, hairdressing and body shaping class includes the contents such as skin care item, cosmetics, manicure, slim, washing product;
Wherein, cuisines class includes the contents such as food, good-for-nothing, recipe;
Wherein, entertainment class includes the contents such as amusement circles, concert, modern drama, exhibition;
Wherein, other include the contents such as ownness, personal emotion, social view, life view.
Step S17, with the result of automatic classification as foundation, to the user's recommended advertisements reading microblog data.
In this step, if the result of automatic classification is that the microblog data of user is included into certain class, recommends to user and be somebody's turn to do The corresponding advertisement of classification.Here advertisement includes news, music, film, microblogging etc..
In the embodiment of the present invention, carry out excavating, classify by the microblog data that user is issued, judge that this user's is emerging Interesting preference, and then recommend corresponding advertisement to this user.The information being comprised than user tag due to the microblog data that user issues With more real-time, the interest preference of user, the judged result therefore obtaining more can be represented by the microblog data analyzing user More accurate, thus the advertisement recommended is also more accurate, effect is also more preferable.
As one embodiment of the present invention, in step S16, using the disaggregated model being previously obtained by the microblogging number of user Comprise the steps according to before the step being automatically categorized in the classification dividing in advance:
Step A, reading training microblog data.
In this step, read the microblog data as training for the microblog data of multiple users, to improve follow-up excavation as far as possible Accuracy.
Step B, the training microblog data handmarking of described reading is the classification that divides in advance.
In this step, the every microblog data reading is labeled as the class in the classification dividing in advance by several makers, In the classification of every microblog data of labelling, using the principle that the minority is subordinate to the majority.
The training microblog data that step C, initialization are read, to obtain microblog text lexical item set, described initialization is read Training microblog data include removing special symbol in the training microblog data reading, non-Chinese character, in participle.
Step D, the stop words of the described microblog text lexical item set of deletion, to obtain microblogging text primitive character lexical item collection Close.
Step E, generation feature lexical item dictionary.
In this step, the step generating feature lexical item dictionary specifically includes:Calculate microblogging text primitive character lexical item set In each lexical item association relationship;Choose association relationship ranking front N N number of lexical item as feature lexical item dictionary lexical item, institute Stating N is integer, and N is more than 0.For example select 5000 lexical items of association relationship highest as the lexical item of feature lexical item dictionary, generate Feature lexical item dictionary can be arranged according to the height of association relationship.
Step F, described microblogging text primitive character lexical item set is mapped with described feature lexical item dictionary, judged institute Whether the lexical item stated in microblogging text primitive character lexical item set occurs in described feature lexical item dictionary, and calculates and occur in institute State the tf-idf value of the lexical item in the described microblogging text primitive character lexical item set in feature lexical item dictionary, using as described go out Lexical item in described microblogging text primitive character lexical item set in described feature lexical item dictionary is in the eigenvalue of microblogging now.
Step G, judge whether the lexical item of described feature lexical item dictionary occurs in described microblogging text primitive character lexical item collection In conjunction, and the spy by the lexical item of the described feature lexical item dictionary not appeared in described microblogging text primitive character lexical item set Value indicative is labeled as 0.
Step H, the characteristic vector being formed using the calculated all eigenvalues of default Algorithm for Training, are divided with obtaining Class model.
In this step, train the corresponding eigenvectors matrix of all microblog data, subsequently excavate the microblogging number of certain user According to when result training after can be used directly.
Wherein, default algorithm includes any one algorithm following:Support vector machines, Naive Bayes Classification Algorithm, god Close on sorting algorithm, genetic algorithm through network, K.
In the present embodiment, by analyzing the microblog data of a large number of users, generate feature lexical item dictionary, this feature lexical item dictionary There is provided a reference standard for the later interest preference excavating certain user.
As one embodiment of the present invention, step S17, with the result of automatic classification as foundation, to reading microblog data The step of user's recommended advertisements specifically include:Every percentage ratio shared by class microblogging in the microblogging of counting user;Statistics is every Label in microblogging data is mated percentage ratio shared by class microblogging with user, and by hundred shared by the classification that the match is successful Divide than double;Recommend the advertisement of the M classification in front M for the ranking to the user reading microblog data, described M is integer, M is more than 0.
In the present embodiment, the history microblogging of user is carried out with classification and counts every class microblogging percentage and this user Label in data is mated, if label is contained within certain class, then such microblogging percentage is double, finally selects hundred Divide ratio highest M classification, classification is recommended in the advertisement for example selecting three classifications as this user.Preferably, after a period of time Can recalculate and show that classification is recommended in the up-to-date advertisement of this user.
Embodiment two:
Fig. 2 shows a kind of structure of advertisement commending system based on microblogging that second embodiment of the invention provides, in order to It is easy to illustrate, illustrate only the part related to the embodiment of the present invention.
Should be can be used for various by wired or wireless network connection server based on the advertisement commending system of microblogging The information processing terminal, such as mobile phone, pocket computer(Pocket Personal Computer, PPC), palm PC, Computer, notebook computer, personal digital assistant(Personal Digital Assistant, PDA)Deng can be operate in Unit that software unit in these information processing terminals, hardware cell or software and hardware combine is it is also possible to as independent Suspension member is integrated in these information processing terminals or runs in the application system of these information processing terminals, wherein:
First data reads in module 201, for reading the microblog data of user.
First data initialization module 202, the microblog data reading for initialization, to obtain microblog text lexical item collection Close, described initialization read microblog data include remove read microblog data in special symbol, non-Chinese character, participle In.
Fisrt feature extraction module 203, for deleting the stop words of described microblog text lexical item set, to obtain microblogging literary composition This original feature lexical item set.
First eigenvector module 204, for by described microblogging text primitive character lexical item set with previously generate Feature lexical item dictionary is mapped, and judges whether the lexical item in described microblogging text primitive character lexical item set occurs in described pre- In the feature lexical item dictionary first generating, and calculate occur in described in described microblogging text in the feature lexical item dictionary that previously generates The tf-idf value of the lexical item in primitive character lexical item set, using as the feature lexical item dictionary previously generating described in described occurring in In described microblogging text primitive character lexical item set in lexical item microblogging eigenvalue.And be used for judging described pre- Mr. Whether the lexical item of the feature lexical item dictionary becoming occurs in described microblogging text primitive character lexical item set, and will not appear in The eigenvalue labelling of the lexical item of feature lexical item dictionary previously generating described in described microblogging text primitive character lexical item set For 0.
Wherein, through the calculating of first eigenvector module 204, the microblog data of every microblogging is changed into one the most at last Individual latitude is 5000 characteristic vector.
Sort module 205, for being automatically categorized into the microblog data of user in advance using the disaggregated model being previously obtained In the classification dividing.
Wherein, the classification dividing in advance can be 12 classes, specifically as shown in step S16, repeats no more here.
Recommending module 206, for the result of automatic classification as foundation, to the user's recommended advertisements reading microblog data.
Wherein, advertisement here includes the contents such as news, music, film, microblogging.
In the embodiment of the present invention, by excavating to the microblog data reading, divide generic, and recommend to user The advertisement related to dividing classification.Because microblog data can reflect the interest preference of user in time, therefore pass through to analyze user The judged result that obtains of microblog data more accurate, thus the advertisement recommended is also more accurate, effect is also more preferable.
Fig. 3 shows another structure of the advertisement commending system based on microblogging, and another as the present invention is preferable to carry out Example, described is also included based on the advertisement commending system of microblogging:
Second data reads in module 301, for reading training microblog data.
Wherein, the microblog data of reading is the microblog data of multiple users.
Manual sort's module 302, for being the classification dividing in advance by the training microblog data handmarking of described reading.
Second data initialization module 303, the training microblog data reading for initialization, to obtain microblog text lexical item Set, described initialization read training microblog data include remove reading training microblog data in special symbol, non-in In Chinese character, participle.
Second feature extraction module 304, for deleting the stop words of described microblog text lexical item set, to obtain microblogging literary composition This original feature lexical item set.
Feature lexical item dictionary generation module 305, for generating feature lexical item dictionary.
Wherein, feature lexical item dictionary generation module 305 includes:
Association relationship computing module, for calculating the mutual information of each lexical item in microblogging text primitive character lexical item set Value.
Feature lexical item dictionary lexical item selecting module, for choosing N number of lexical item in front N for the association relationship ranking as Feature Words The lexical item of item dictionary, described N is integer, and N is more than 0.
Second feature vectorization module 306, for by described microblogging text primitive character lexical item set and described Feature Words Item dictionary is mapped, and judges whether the lexical item in described microblogging text primitive character lexical item set occurs in described feature lexical item In dictionary, and calculate the lexical item in the described microblogging text primitive character lexical item set occurring in described feature lexical item dictionary Tf-idf value, using as in the described described microblogging text primitive character lexical item set occurring in described feature lexical item dictionary Lexical item is in the eigenvalue of microblogging.And it is former for judging whether the lexical item of described feature lexical item dictionary occurs in described microblogging text In beginning feature lexical item set, and the described feature lexical item word in described microblogging text primitive character lexical item set will not appeared in The eigenvalue of the lexical item of allusion quotation is labeled as 0.
Training module 307, for the feature that formed using the calculated all eigenvalues of default Algorithm for Training to Amount, to obtain disaggregated model.
Wherein, default algorithm includes any one algorithm following:
Support vector machines, Naive Bayes Classification Algorithm, neutral net, K close on sorting algorithm, genetic algorithm.
In the present embodiment, by analyzing the microblog data of a large number of users, generate feature lexical item dictionary, this feature lexical item dictionary There is provided a reference standard for the later interest preference excavating certain user.
As one embodiment of the present invention, described recommending module 206 includes:
Data statistics module, for the percentage ratio shared by class microblogging every in the microblogging of counting user.
Data match module, for by percentage ratio and label in microblogging data for the user shared by every class microblogging of statistics Mated, and will be double for the percentage ratio shared by the classification that the match is successful.
Advertisement recommending module, for recommending the advertisement of the M classification in front M for the ranking, institute to the user reading microblog data Stating M is integer, and M is more than 0.
In the present embodiment, only choose ranking and recommend client in the advertisement of front M classification, browse pressure not increasing client On the basis of make advertisement putting more accurate.
In embodiments of the present invention, the microblog data by issuing to user carries out excavating, classifies, and combines user micro- Rich label information judges the interest preference of this user, and then recommends corresponding advertisement to this user.Issued due to user The information that microblog data comprises than user tag has more real-time, more can represent the interest preference of user, therefore passes through analysis The judged result that the microblog data of user and label information obtain is more accurate than only analyzing tags information, thus the advertisement recommended More accurate, effect is also more preferable.
The foregoing is only presently preferred embodiments of the present invention, not in order to limit the present invention, all essences in the present invention Any modification, equivalent and improvement made within god and principle etc., should be included within the scope of the present invention.

Claims (8)

1. a kind of advertisement based on microblogging recommends method it is characterised in that methods described comprises the steps:
Read the microblog data of user;
The microblog data that initialization is read, to obtain microblog text lexical item set, the microblog data that described initialization is read includes Special symbol in the microblog data that removal is read, non-Chinese character, participle;
Delete the stop words of described microblog text lexical item set, to obtain microblogging text primitive character lexical item set;
Described microblogging text primitive character lexical item set is mapped with the feature lexical item dictionary previously generating, is judged described micro- Lexical item in this original feature lexical item set of blog article whether occur in described in the feature lexical item dictionary that previously generates, and calculate The word frequency of the lexical item in described microblogging text primitive character lexical item set in the now described feature lexical item dictionary previously generating- Reverse document-frequency tf-idf value, using as the described microblogging literary composition in the feature lexical item dictionary previously generating described in described occurring in Lexical item in this original feature lexical item set is in the eigenvalue of microblogging;
Whether the lexical item of the feature lexical item dictionary previously generating described in judgement occurs in described microblogging text primitive character lexical item collection In conjunction, and the feature lexical item dictionary previously generating described in not appearing in described microblogging text primitive character lexical item set The eigenvalue of lexical item be labeled as 0;
Using the disaggregated model being previously obtained, the microblog data of user is categorized in the classification dividing in advance automatically;
With the result of automatic classification as foundation, to the user's recommended advertisements reading microblog data;
As foundation, the step to the user's recommended advertisements reading microblog data specifically includes the described result with automatic classification:
Every percentage ratio shared by class microblogging in the microblogging of counting user;
By the percentage ratio shared by every class microblogging of statistics, the label in microblogging data is mated with user, and by the match is successful The percentage ratio shared by classification double;
Recommend the advertisement of the M classification in front M for the ranking to the user reading microblog data, described M is integer, M is more than 0.
2. the method for claim 1 is it is characterised in that use the disaggregated model being previously obtained that user is micro- described Rich data automatic classification comprises the steps before in the classification dividing in advance:
Read training microblogging;
It the training microblog data handmarking of described reading is the classification dividing in advance;
The training microblog data that initialization is read, to obtain microblog text lexical item set, the training microblogging that described initialization is read Data include remove read training microblog data in special symbol, non-Chinese character, participle;
Delete the stop words of described microblog text lexical item set, to obtain microblogging text primitive character lexical item set;
Generate feature lexical item dictionary;
Described microblogging text primitive character lexical item set is mapped with described feature lexical item dictionary, is judged described microblogging text Whether the lexical item in primitive character lexical item set occurs in described feature lexical item dictionary, and calculates and occur in described feature lexical item The tf-idf value of the lexical item in described microblogging text primitive character lexical item set in dictionary, to occur in described spy as described Levy the eigenvalue in microblogging for the lexical item in the described microblogging text primitive character lexical item set in lexical item dictionary;
Judge whether the lexical item of described feature lexical item dictionary occurs in described microblogging text primitive character lexical item set, and will not have The eigenvalue occurring the lexical item of the described feature lexical item dictionary in described microblogging text primitive character lexical item set is labeled as 0;
The characteristic vector being formed using the calculated all eigenvalues of default Algorithm for Training, to obtain disaggregated model.
3. method as claimed in claim 2 is it is characterised in that the step of described generation feature lexical item dictionary specifically includes:
Calculate the association relationship of each lexical item in microblogging text primitive character lexical item set;
Choose association relationship ranking front N N number of lexical item as feature lexical item dictionary lexical item, described N be integer, N be more than 0.
4. method as claimed in claim 2 is it is characterised in that described default algorithm includes any one algorithm following:
Support vector machines, Naive Bayes Classification Algorithm, neutral net, K close on sorting algorithm, genetic algorithm.
5. a kind of advertisement commending system based on microblogging is it is characterised in that described system includes:
First data reads in module, for reading the microblog data of user;
First data initialization module, the microblog data reading for initialization, to obtain microblog text lexical item set, described first Beginningization read microblog data include remove read microblog data in special symbol, non-Chinese character, participle;
Fisrt feature extraction module, for deleting the stop words of described microblog text lexical item set, original to obtain microblogging text Feature lexical item set;
First eigenvector module, for by described microblogging text primitive character lexical item set and the feature lexical item previously generating Dictionary is mapped, and judges lexical item in described microblogging text primitive character lexical item set previously generates described in whether occurring in In feature lexical item dictionary, and calculate occur in described in described microblogging text primitive character in the feature lexical item dictionary that previously generates The tf-idf value of the lexical item in lexical item set, using as in the feature lexical item dictionary previously generating described in described occurring in described in Lexical item in microblogging text primitive character lexical item set is in the eigenvalue of microblogging;And the feature for previously generating described in judging Whether the lexical item of lexical item dictionary occurs in described microblogging text primitive character lexical item set, and will not appear in described microblogging The eigenvalue of the lexical item of feature lexical item dictionary previously generating described in text primitive character lexical item set is labeled as 0;
Sort module, for being automatically categorized into, by the microblog data of user, the class dividing in advance using the disaggregated model being previously obtained In not;
Recommending module, for the result of automatic classification as foundation, to the user's recommended advertisements reading microblog data;
Described recommending module includes:
Data statistics module, for the percentage ratio shared by class microblogging every in the microblogging of counting user;
Data match module, for by the percentage ratio shared by every class microblogging of statistics, the label in microblogging data is carried out with user Coupling, and will be double for the percentage ratio shared by the classification that the match is successful;
Advertisement recommending module, for recommending the advertisement of the M classification in front M for the ranking to the user reading microblog data, described M is Integer, M is more than 0.
6. system as claimed in claim 5 is it is characterised in that described system also includes:
Second data reads in module, for reading training microblog data;
Manual sort's module, for being the classification dividing in advance by the training microblog data handmarking of described reading;
Second data initialization module, the training microblog data reading for initialization, to obtain microblog text lexical item set, institute State initialization read training microblog data include remove reading training microblog data in special symbol, non-Chinese character, Participle;
Second feature extraction module, for deleting the stop words of described microblog text lexical item set, original to obtain microblogging text Feature lexical item set;
Feature lexical item dictionary generation module, for generating feature lexical item dictionary;
Second feature vectorization module, for entering described microblogging text primitive character lexical item set with described feature lexical item dictionary Row mapping, judges whether the lexical item in described microblogging text primitive character lexical item set occurs in described feature lexical item dictionary, And calculate the tf-idf of the lexical item in the described microblogging text primitive character lexical item set occurring in described feature lexical item dictionary Value, to exist as the lexical item in the described described microblogging text primitive character lexical item set occurring in described feature lexical item dictionary The eigenvalue of microblogging;And for judging whether the lexical item of described feature lexical item dictionary occurs in described microblogging text primitive character In lexical item set, and the word by the described feature lexical item dictionary not appeared in described microblogging text primitive character lexical item set The eigenvalue of item is labeled as 0;
Training module, for the characteristic vector being formed using the calculated all eigenvalues of default Algorithm for Training, to obtain Obtain disaggregated model.
7. system as claimed in claim 6 is it is characterised in that described feature lexical item dictionary generation module includes:
Association relationship computing module, for calculating the association relationship of each lexical item in microblogging text primitive character lexical item set;
Feature lexical item dictionary lexical item selecting module, for choosing N number of lexical item in front N for the association relationship ranking as feature lexical item word The lexical item of allusion quotation, described N is integer, and N is more than 0.
8. system as claimed in claim 6 is it is characterised in that described default algorithm includes any one algorithm following:
Support vector machines, Naive Bayes Classification Algorithm, neutral net, K close on sorting algorithm, genetic algorithm.
CN201310608335.7A 2013-11-26 2013-11-26 Method and system for advertisement recommendation based microblog Active CN103617230B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310608335.7A CN103617230B (en) 2013-11-26 2013-11-26 Method and system for advertisement recommendation based microblog

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310608335.7A CN103617230B (en) 2013-11-26 2013-11-26 Method and system for advertisement recommendation based microblog

Publications (2)

Publication Number Publication Date
CN103617230A CN103617230A (en) 2014-03-05
CN103617230B true CN103617230B (en) 2017-02-15

Family

ID=50167933

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310608335.7A Active CN103617230B (en) 2013-11-26 2013-11-26 Method and system for advertisement recommendation based microblog

Country Status (1)

Country Link
CN (1) CN103617230B (en)

Families Citing this family (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104851026B (en) * 2015-05-21 2018-07-17 上海宾谷网络科技有限公司 Position the primary advertisement reward system and method for bidding of user in real time based on big data
CN104915386B (en) * 2015-05-25 2018-04-27 中国科学院自动化研究所 A kind of short text clustering method based on deep semantic feature learning
CN104992347B (en) * 2015-06-17 2018-12-14 北京奇艺世纪科技有限公司 A kind of method and device of video matching advertisement
CN106339402B (en) 2015-07-16 2020-11-24 腾讯科技(深圳)有限公司 Method, device and system for pushing recommended content
CN105389345A (en) * 2015-10-26 2016-03-09 天津大学 Short message text content classification method
CN105975497A (en) * 2016-04-27 2016-09-28 清华大学 Automatic microblog topic recommendation method and device
WO2018023658A1 (en) * 2016-08-05 2018-02-08 汤隆初 Method for pushing advertisement according to followed public account, and push system
WO2018023656A1 (en) * 2016-08-05 2018-02-08 汤隆初 Method for adjusting advertisement push according to usage conditions of other users, and push system
WO2018023657A1 (en) * 2016-08-05 2018-02-08 汤隆初 Method for adjusting wechat public account-based advertisement push technique, and push system
CN106886579B (en) * 2017-01-23 2020-01-14 北京航空航天大学 Real-time streaming text grading monitoring method and device
CN107086925B (en) * 2017-03-07 2020-04-07 珠海城市职业技术学院 Deep learning-based internet traffic big data analysis method
CN107169799B (en) * 2017-05-17 2020-10-27 微梦创科网络科技(中国)有限公司 Method and system for realizing advertisement delivery instead of native information stream based on social relationship
CN109145280B (en) * 2017-06-15 2023-05-12 北京京东尚科信息技术有限公司 Information pushing method and device
CN107590195A (en) * 2017-08-14 2018-01-16 百度在线网络技术(北京)有限公司 Textual classification model training method, file classification method and its device
CN108399194A (en) * 2018-01-29 2018-08-14 中国科学院信息工程研究所 A kind of Cyberthreat information generation method and system
CN109214893A (en) * 2018-08-31 2019-01-15 深圳春沐源控股有限公司 Method of Commodity Recommendation, recommender system and computer installation
CN110781303A (en) * 2019-10-28 2020-02-11 佰聆数据股份有限公司 Short text classification method and system
CN111369298A (en) * 2020-03-09 2020-07-03 成都欧魅时尚科技有限责任公司 Method for automatically adjusting advertisement budget based on Internet hotspot event

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101634983A (en) * 2008-07-21 2010-01-27 华为技术有限公司 Method and device for text classification
US8027977B2 (en) * 2007-06-20 2011-09-27 Microsoft Corporation Recommending content using discriminatively trained document similarity
CN103324708A (en) * 2013-06-18 2013-09-25 哈尔滨工程大学 Method of transfer learning from long text to short text
CN103389981A (en) * 2012-05-08 2013-11-13 腾讯科技(深圳)有限公司 Network label automatic identification method and system thereof

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8027977B2 (en) * 2007-06-20 2011-09-27 Microsoft Corporation Recommending content using discriminatively trained document similarity
CN101634983A (en) * 2008-07-21 2010-01-27 华为技术有限公司 Method and device for text classification
CN103389981A (en) * 2012-05-08 2013-11-13 腾讯科技(深圳)有限公司 Network label automatic identification method and system thereof
CN103324708A (en) * 2013-06-18 2013-09-25 哈尔滨工程大学 Method of transfer learning from long text to short text

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"面向中文微博的社会网络分析及应用";麦艺华;《中国优秀硕士学位论文全文数据库 信息科技辑》;20130215(第2期);第11-44页 *

Also Published As

Publication number Publication date
CN103617230A (en) 2014-03-05

Similar Documents

Publication Publication Date Title
CN103617230B (en) Method and system for advertisement recommendation based microblog
US11734725B2 (en) Information sending method, apparatus and system, and computer-readable storage medium
CN103678335B (en) The method of method, apparatus and the commodity navigation of commodity sign label
CN106709040B (en) Application search method and server
CN103870973B (en) Information push, searching method and the device of keyword extraction based on electronic information
CN104866496B (en) method and device for determining morpheme importance analysis model
CN109978630A (en) A kind of Precision Marketing Method and system for establishing user's portrait based on big data
CN108628833B (en) Method and device for determining summary of original content and method and device for recommending original content
US20220405607A1 (en) Method for obtaining user portrait and related apparatus
CN106294500B (en) Content item pushing method, device and system
CN107818105A (en) The recommendation method and server of application program
CN106062730A (en) Systems and methods for actively composing content for use in continuous social communication
WO2016192309A1 (en) Pushed information processing method, apparatus, and device, and non-volatile computer storage medium
Zhang et al. Multimodal marketing intent analysis for effective targeted advertising
JP6033697B2 (en) Image evaluation device
CN104036002A (en) Technical method for intelligently recommending data
CN104050243B (en) It is a kind of to search for the network search method combined with social activity and its system
JP5754854B2 (en) Contributor analysis apparatus, program and method for analyzing poster profile information
CN111008333B (en) Shopping system and method based on user characteristic information and product component information
CN115244547A (en) Automatically and intelligently exploring design spaces
CN105931082B (en) Commodity category keyword extraction method and device
JP5513860B2 (en) Document decoration support system and document decoration support method
CN108984711A (en) A kind of personalized APP recommended method based on layering insertion
KR101652433B1 (en) Behavioral advertising method according to the emotion that are acquired based on the extracted topics from SNS document
Liao et al. Mining information users’ knowledge for one-to-one marketing on information appliance

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
EE01 Entry into force of recordation of patent licensing contract
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20140305

Assignee: Dongguan Shengnuolin Sports Products Co.,Ltd.

Assignor: SHENZHEN INSTITUTES OF ADVANCED TECHNOLOGY CHINESE ACADEMY OF SCIENCES

Contract record no.: X2023980037877

Denomination of invention: A Weibo based advertising recommendation method and system

Granted publication date: 20170215

License type: Common License

Record date: 20230712

EE01 Entry into force of recordation of patent licensing contract
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20140305

Assignee: Shenzhen Huayun Xingchuang Cultural Technology Co.,Ltd.

Assignor: SHENZHEN INSTITUTES OF ADVANCED TECHNOLOGY CHINESE ACADEMY OF SCIENCES

Contract record no.: X2023980043804

Denomination of invention: A Weibo based advertising recommendation method and system

Granted publication date: 20170215

License type: Common License

Record date: 20231018

Application publication date: 20140305

Assignee: Shenzhen Xingfei Software Technology Co.,Ltd.

Assignor: SHENZHEN INSTITUTES OF ADVANCED TECHNOLOGY CHINESE ACADEMY OF SCIENCES

Contract record no.: X2023980043566

Denomination of invention: A Weibo based advertising recommendation method and system

Granted publication date: 20170215

License type: Common License

Record date: 20231016