CN103617230B

CN103617230B - Method and system for advertisement recommendation based microblog

Info

Publication number: CN103617230B
Application number: CN201310608335.7A
Authority: CN
Inventors: 章昉; 刘明君; 赵中英
Original assignee: Shenzhen Institute of Advanced Technology of CAS
Current assignee: Shenzhen Institute of Advanced Technology of CAS
Priority date: 2013-11-26
Filing date: 2013-11-26
Publication date: 2017-02-15
Anticipated expiration: 2033-11-26
Also published as: CN103617230A

Abstract

The invention belongs to the field of data mining and provides a method and system for advertisement recommendation based a microblog. The method comprises the steps that microblog data are read; the microblog data are initialized and a microblog text lexical item set is obtained; stop words of the microblog text lexical item set are deleted and a microblog text original feature lexical item set is obtained; mapping is conducted on the microblog text original feature lexical item set and a feature lexical item dictionary, whether lexical items in the microblog text original feature lexical item set exist in the feature lexical item dictionary or not is judged, and the tf-idf values of the appearing lexical items are calculated and serve as the feature values of the lexical items; whether the lexical items of the feature lexical item dictionary exist in the microblog text original feature lexical item set or not is judged and the feature values of the lexical items which do not appear are marked to be zero; feature vectors of the feature values obtained through calculation are automatically classified to classifications divided in advance; according to an automatic classification result, advertisements are recommended to a user. The advertisements recommended by the method and system are accurate and the effect is good.

Description

A kind of method and system for advertisement recommendation based on microblogging

Technical field

The invention belongs to Data Mining, more particularly, to a kind of method and system for advertisement recommendation based on microblogging.

Background technology

With the social network sites such as Sina weibo, Tengxun's microblogging at home popular, the social media such as microblogging not only becomes Netizen's issue, platform that is shared, propagating information, and have accumulated the behavioral data of extensive netizen.In May, 2012, Sina Microblogging division department vice general manager Lu Yi points out, more than 300,000,000, user averagely issues more than 100,000,000 Sina weibo register user daily Bar content of microblog.The radix of microblog users is big, and data volume is big, if microblogging operation system can be analyzed and excavate existing magnanimity number According to more accurately being judged to the interest of microblog users according to analysis result, the interest according to microblog users is to it Carry out advertisement putting, then advertisement microblog users being pushed will make microblog users, businessman and microblogging operator tripartite all benefited.

Existing microblogging advertisement recommends method mainly to utilize the label in individual subscriber data or the search using user Record carries out interest judgement to microblog users, and then it is pushed with the advertisement that user may be interested.Due to a lot of users Inside people's data and not contain the label or user label filled in when creating personal information inaccurate, therefore marked by user Label it is carried out advertisement to be recommended to reach good effect.And by the search record of microblog users judging user's Interest has certain limitation, is only capable of representing being currently needed for of this user and its interest accurately can not be sentenced Disconnected.

Content of the invention

Embodiments provide a kind of advertisement based on microblogging and recommend method it is intended to solve existing method in excavation During user profile, accuracy is low, thus leading to the bad problem of advertisement recommendation effect.

The embodiment of the present invention is achieved in that method is recommended in a kind of advertisement based on microblogging, and methods described includes following Step：

Read the microblog data of user；

The microblog data that initialization is read, to obtain microblog text lexical item set, the microblog data that described initialization is read Including the special symbol removing in the microblog data reading, non-Chinese character, participle；

Delete the stop words of described microblog text lexical item set, to obtain microblogging text primitive character lexical item set；

Described microblogging text primitive character lexical item set is mapped with the feature lexical item dictionary previously generating, is judged institute State the lexical item in microblogging text primitive character lexical item set whether occur in described in the feature lexical item dictionary that previously generates, and count Calculate lexical item in the described microblogging text primitive character lexical item set in the now described feature lexical item dictionary previously generating Word frequency-reverse document-frequency tf-idf value, using as in the feature lexical item dictionary previously generating described in described occurring in described in Lexical item in microblogging text primitive character lexical item set is in the eigenvalue of microblogging；

Whether the lexical item of the feature lexical item dictionary previously generating described in judgement occurs in described microblogging text primitive character word In set, and the feature lexical item previously generating described in not appearing in described microblogging text primitive character lexical item set The eigenvalue of the lexical item of dictionary is labeled as 0；

Using the disaggregated model being previously obtained, the microblog data of user is categorized in the classification dividing in advance automatically；

With the result of automatic classification as foundation, to the user's recommended advertisements reading microblog data.

The another object of the embodiment of the present invention is to provide a kind of advertisement commending system based on microblogging, described system bag Include：

First data reads in module, for reading the microblog data of user；

First data initialization module, the microblog data reading for initialization, to obtain microblog text lexical item set, institute State the special symbol initializing the microblog data reading and including in the microblog data that removal is read, non-Chinese character, participle；

Fisrt feature extraction module, for deleting the stop words of described microblog text lexical item set, to obtain microblogging text Primitive character lexical item set；

First eigenvector module, for by described microblogging text primitive character lexical item set and the feature previously generating Lexical item dictionary is mapped, and judges whether the lexical item in described microblogging text primitive character lexical item set occurs in described pre- Mr. In the feature lexical item dictionary becoming, and calculate occur in described in described microblogging text in the feature lexical item dictionary that previously generates original The tf-idf value of the lexical item in feature lexical item set, using as in the feature lexical item dictionary previously generating described in described occurring in Lexical item in described microblogging text primitive character lexical item set is in the eigenvalue of microblogging；And for previously generating described in judging Whether the lexical item of feature lexical item dictionary occurs in described microblogging text primitive character lexical item set, and will not appear in described The eigenvalue of the lexical item of feature lexical item dictionary previously generating described in microblogging text primitive character lexical item set is labeled as 0；

Sort module, is divided in advance for being automatically categorized into the microblog data of user using the disaggregated model being previously obtained Classification in；

Recommending module, for the result of automatic classification as foundation, to the user's recommended advertisements reading microblog data.

In the embodiment of the present invention, because the information that the microblog data that user issues comprises than user tag has more in real time Property, more can represent the interest preference of user, the judged result therefore obtaining by the microblog data analyzing user is more accurate, thus The advertisement recommended is also more accurate, and effect is also more preferable.

Brief description

Fig. 1 is the flow chart that method is recommended in a kind of advertisement based on microblogging that first embodiment of the invention provides；

Fig. 2 is a kind of advertisement commending system structure chart based on microblogging that second embodiment of the invention provides；

Fig. 3 is the advertisement commending system structure chart based on microblogging for the another kind that second embodiment of the invention provides.

Specific embodiment

In order that the objects, technical solutions and advantages of the present invention become more apparent, below in conjunction with drawings and Examples, right The present invention is further elaborated.It should be appreciated that specific embodiment described herein is only in order to explain the present invention, and It is not used in the restriction present invention.

The embodiment of the present invention is carried out excavating, is classified by the microblog data that user is issued, and judges the interest of this user Preference, and then recommend corresponding advertisement to this user.

Embodiments provide one kind：Method and system for advertisement recommendation based on microblogging.

Methods described includes：Read the microblog data of user；

Described system includes：First data reads in module, for reading the microblog data of user；

In order to technical solutions according to the invention are described, to illustrate below by specific embodiment.

Embodiment one：

Fig. 1 shows that method is recommended in a kind of advertisement based on microblogging that first embodiment of the invention provides, and details are as follows：

Step S11, reads the microblog data of user.

The microblog data of user in this step, can be obtained in advance, the microblog data of acquisition is stored in data base, need When the microblog data of certain user being analyzed, then the microblog data reading this user.

Step S12, the microblog data that initialization is read, to obtain microblog text lexical item set, described initialization is read Microblog data include remove read microblog data in special symbol, non-Chinese character, participle.

In this step, initialization process is carried out to every microblog data, such as remove special symbol, the removals such as punctuation mark Non- Chinese character, participle etc., obtain a microblog text lexical item set after initialization process.

Step S13, deletes the stop words of described microblog text lexical item set, to obtain microblogging text primitive character lexical item collection Close.

Step S14, described microblogging text primitive character lexical item set is reflected with the feature lexical item dictionary previously generating Penetrate, judge the lexical item in described microblogging text primitive character lexical item set whether occur in described in the feature lexical item word that previously generates In allusion quotation, and calculate occur in described in described microblogging text primitive character lexical item set in the feature lexical item dictionary that previously generates Lexical item word frequency-reverse document-frequency（Term frequency-inverse document frequency, tf-idf） Value, using as the described microblogging text primitive character lexical item set in the feature lexical item dictionary previously generating described in described occurring in In lexical item microblogging eigenvalue.

In this step, the microblogging text primitive character lexical item set of every microblogging is mapped to feature lexical item dictionary, If the lexical item of microblogging text primitive character lexical item set is in feature lexical item dictionary, then the tf-idf value calculating this lexical item is made For eigenvalue in this microblogging for this lexical item.

Step S15, it is former whether the lexical item of the feature lexical item dictionary previously generating described in judgement occurs in described microblogging text In beginning feature lexical item set, and previously generate described in not appearing in described microblogging text primitive character lexical item set The eigenvalue of the lexical item of feature lexical item dictionary is labeled as 0.

In this step, not in feature lexical item dictionary, this lexical item is ignored the lexical item of microblogging text primitive character lexical item set, If the lexical item in feature lexical item dictionary does not appear in microblogging text primitive character lexical item set, the eigenvalue of this lexical item is 0；Finally the microblogging text of every microblogging is transformed into the characteristic vector that dimension is 5000.

The microblog data of user is categorized into the class dividing in advance using the disaggregated model being previously obtained by step S16 automatically In not.

In this step, plurality of classes can be divided according to the actual requirements in advance, such as, divide 12 kinds of classifications in advance, have respectively Sport category, healthy class, educational, GT grand touring, scientific and technological class, automotive-type, game class, beauty treatment, hairdressing and body shaping class, cuisines class, clothing footwear Boots bag class, entertainment class, other.

Wherein, sport category includes the contents such as competitive sports, physical culture newpapers and periodicals, sports star；

Wherein, healthy class includes the contents such as healthy general knowledge, medicine, physical condition；

Wherein, the training organization such as educational inclusion New Orient, new navigation channel, the study condition of individual, learning intent, go abroad and stay Etc. content；

Wherein, GT grand touring includes the contents such as sight spot, recreation ground, travel abroad, free walker, hotel；

Wherein, scientific and technological class includes the contents such as mobile phone, computer, digital product；

Wherein, automotive-type includes the contents such as automobile, automobile journal；

Wherein, game class includes the contents such as mobile phone games, web game, online game；

Wherein, beauty treatment, hairdressing and body shaping class includes the contents such as skin care item, cosmetics, manicure, slim, washing product；

Wherein, cuisines class includes the contents such as food, good-for-nothing, recipe；

Wherein, entertainment class includes the contents such as amusement circles, concert, modern drama, exhibition；

Wherein, other include the contents such as ownness, personal emotion, social view, life view.

Step S17, with the result of automatic classification as foundation, to the user's recommended advertisements reading microblog data.

In this step, if the result of automatic classification is that the microblog data of user is included into certain class, recommends to user and be somebody's turn to do The corresponding advertisement of classification.Here advertisement includes news, music, film, microblogging etc..

In the embodiment of the present invention, carry out excavating, classify by the microblog data that user is issued, judge that this user's is emerging Interesting preference, and then recommend corresponding advertisement to this user.The information being comprised than user tag due to the microblog data that user issues With more real-time, the interest preference of user, the judged result therefore obtaining more can be represented by the microblog data analyzing user More accurate, thus the advertisement recommended is also more accurate, effect is also more preferable.

As one embodiment of the present invention, in step S16, using the disaggregated model being previously obtained by the microblogging number of user Comprise the steps according to before the step being automatically categorized in the classification dividing in advance：

Step A, reading training microblog data.

In this step, read the microblog data as training for the microblog data of multiple users, to improve follow-up excavation as far as possible Accuracy.

Step B, the training microblog data handmarking of described reading is the classification that divides in advance.

In this step, the every microblog data reading is labeled as the class in the classification dividing in advance by several makers, In the classification of every microblog data of labelling, using the principle that the minority is subordinate to the majority.

The training microblog data that step C, initialization are read, to obtain microblog text lexical item set, described initialization is read Training microblog data include removing special symbol in the training microblog data reading, non-Chinese character, in participle.

Step D, the stop words of the described microblog text lexical item set of deletion, to obtain microblogging text primitive character lexical item collection Close.

Step E, generation feature lexical item dictionary.

In this step, the step generating feature lexical item dictionary specifically includes：Calculate microblogging text primitive character lexical item set In each lexical item association relationship；Choose association relationship ranking front N N number of lexical item as feature lexical item dictionary lexical item, institute Stating N is integer, and N is more than 0.For example select 5000 lexical items of association relationship highest as the lexical item of feature lexical item dictionary, generate Feature lexical item dictionary can be arranged according to the height of association relationship.

Step F, described microblogging text primitive character lexical item set is mapped with described feature lexical item dictionary, judged institute Whether the lexical item stated in microblogging text primitive character lexical item set occurs in described feature lexical item dictionary, and calculates and occur in institute State the tf-idf value of the lexical item in the described microblogging text primitive character lexical item set in feature lexical item dictionary, using as described go out Lexical item in described microblogging text primitive character lexical item set in described feature lexical item dictionary is in the eigenvalue of microblogging now.

Step G, judge whether the lexical item of described feature lexical item dictionary occurs in described microblogging text primitive character lexical item collection In conjunction, and the spy by the lexical item of the described feature lexical item dictionary not appeared in described microblogging text primitive character lexical item set Value indicative is labeled as 0.

Step H, the characteristic vector being formed using the calculated all eigenvalues of default Algorithm for Training, are divided with obtaining Class model.

In this step, train the corresponding eigenvectors matrix of all microblog data, subsequently excavate the microblogging number of certain user According to when result training after can be used directly.

Wherein, default algorithm includes any one algorithm following：Support vector machines, Naive Bayes Classification Algorithm, god Close on sorting algorithm, genetic algorithm through network, K.

In the present embodiment, by analyzing the microblog data of a large number of users, generate feature lexical item dictionary, this feature lexical item dictionary There is provided a reference standard for the later interest preference excavating certain user.

As one embodiment of the present invention, step S17, with the result of automatic classification as foundation, to reading microblog data The step of user's recommended advertisements specifically include：Every percentage ratio shared by class microblogging in the microblogging of counting user；Statistics is every Label in microblogging data is mated percentage ratio shared by class microblogging with user, and by hundred shared by the classification that the match is successful Divide than double；Recommend the advertisement of the M classification in front M for the ranking to the user reading microblog data, described M is integer, M is more than 0.

In the present embodiment, the history microblogging of user is carried out with classification and counts every class microblogging percentage and this user Label in data is mated, if label is contained within certain class, then such microblogging percentage is double, finally selects hundred Divide ratio highest M classification, classification is recommended in the advertisement for example selecting three classifications as this user.Preferably, after a period of time Can recalculate and show that classification is recommended in the up-to-date advertisement of this user.

Embodiment two：

Fig. 2 shows a kind of structure of advertisement commending system based on microblogging that second embodiment of the invention provides, in order to It is easy to illustrate, illustrate only the part related to the embodiment of the present invention.

Should be can be used for various by wired or wireless network connection server based on the advertisement commending system of microblogging The information processing terminal, such as mobile phone, pocket computer（Pocket Personal Computer, PPC）, palm PC, Computer, notebook computer, personal digital assistant（Personal Digital Assistant, PDA）Deng can be operate in Unit that software unit in these information processing terminals, hardware cell or software and hardware combine is it is also possible to as independent Suspension member is integrated in these information processing terminals or runs in the application system of these information processing terminals, wherein：

First data reads in module 201, for reading the microblog data of user.

First data initialization module 202, the microblog data reading for initialization, to obtain microblog text lexical item collection Close, described initialization read microblog data include remove read microblog data in special symbol, non-Chinese character, participle In.

Fisrt feature extraction module 203, for deleting the stop words of described microblog text lexical item set, to obtain microblogging literary composition This original feature lexical item set.

First eigenvector module 204, for by described microblogging text primitive character lexical item set with previously generate Feature lexical item dictionary is mapped, and judges whether the lexical item in described microblogging text primitive character lexical item set occurs in described pre- In the feature lexical item dictionary first generating, and calculate occur in described in described microblogging text in the feature lexical item dictionary that previously generates The tf-idf value of the lexical item in primitive character lexical item set, using as the feature lexical item dictionary previously generating described in described occurring in In described microblogging text primitive character lexical item set in lexical item microblogging eigenvalue.And be used for judging described pre- Mr. Whether the lexical item of the feature lexical item dictionary becoming occurs in described microblogging text primitive character lexical item set, and will not appear in The eigenvalue labelling of the lexical item of feature lexical item dictionary previously generating described in described microblogging text primitive character lexical item set For 0.

Wherein, through the calculating of first eigenvector module 204, the microblog data of every microblogging is changed into one the most at last Individual latitude is 5000 characteristic vector.

Sort module 205, for being automatically categorized into the microblog data of user in advance using the disaggregated model being previously obtained In the classification dividing.

Wherein, the classification dividing in advance can be 12 classes, specifically as shown in step S16, repeats no more here.

Recommending module 206, for the result of automatic classification as foundation, to the user's recommended advertisements reading microblog data.

Wherein, advertisement here includes the contents such as news, music, film, microblogging.

In the embodiment of the present invention, by excavating to the microblog data reading, divide generic, and recommend to user The advertisement related to dividing classification.Because microblog data can reflect the interest preference of user in time, therefore pass through to analyze user The judged result that obtains of microblog data more accurate, thus the advertisement recommended is also more accurate, effect is also more preferable.

Fig. 3 shows another structure of the advertisement commending system based on microblogging, and another as the present invention is preferable to carry out Example, described is also included based on the advertisement commending system of microblogging：

Second data reads in module 301, for reading training microblog data.

Wherein, the microblog data of reading is the microblog data of multiple users.

Manual sort's module 302, for being the classification dividing in advance by the training microblog data handmarking of described reading.

Second data initialization module 303, the training microblog data reading for initialization, to obtain microblog text lexical item Set, described initialization read training microblog data include remove reading training microblog data in special symbol, non-in In Chinese character, participle.

Second feature extraction module 304, for deleting the stop words of described microblog text lexical item set, to obtain microblogging literary composition This original feature lexical item set.

Feature lexical item dictionary generation module 305, for generating feature lexical item dictionary.

Wherein, feature lexical item dictionary generation module 305 includes：

Association relationship computing module, for calculating the mutual information of each lexical item in microblogging text primitive character lexical item set Value.

Feature lexical item dictionary lexical item selecting module, for choosing N number of lexical item in front N for the association relationship ranking as Feature Words The lexical item of item dictionary, described N is integer, and N is more than 0.

Second feature vectorization module 306, for by described microblogging text primitive character lexical item set and described Feature Words Item dictionary is mapped, and judges whether the lexical item in described microblogging text primitive character lexical item set occurs in described feature lexical item In dictionary, and calculate the lexical item in the described microblogging text primitive character lexical item set occurring in described feature lexical item dictionary Tf-idf value, using as in the described described microblogging text primitive character lexical item set occurring in described feature lexical item dictionary Lexical item is in the eigenvalue of microblogging.And it is former for judging whether the lexical item of described feature lexical item dictionary occurs in described microblogging text In beginning feature lexical item set, and the described feature lexical item word in described microblogging text primitive character lexical item set will not appeared in The eigenvalue of the lexical item of allusion quotation is labeled as 0.

Training module 307, for the feature that formed using the calculated all eigenvalues of default Algorithm for Training to Amount, to obtain disaggregated model.

Wherein, default algorithm includes any one algorithm following：

Support vector machines, Naive Bayes Classification Algorithm, neutral net, K close on sorting algorithm, genetic algorithm.

As one embodiment of the present invention, described recommending module 206 includes：

Data statistics module, for the percentage ratio shared by class microblogging every in the microblogging of counting user.

Data match module, for by percentage ratio and label in microblogging data for the user shared by every class microblogging of statistics Mated, and will be double for the percentage ratio shared by the classification that the match is successful.

Advertisement recommending module, for recommending the advertisement of the M classification in front M for the ranking, institute to the user reading microblog data Stating M is integer, and M is more than 0.

In the present embodiment, only choose ranking and recommend client in the advertisement of front M classification, browse pressure not increasing client On the basis of make advertisement putting more accurate.

In embodiments of the present invention, the microblog data by issuing to user carries out excavating, classifies, and combines user micro- Rich label information judges the interest preference of this user, and then recommends corresponding advertisement to this user.Issued due to user The information that microblog data comprises than user tag has more real-time, more can represent the interest preference of user, therefore passes through analysis The judged result that the microblog data of user and label information obtain is more accurate than only analyzing tags information, thus the advertisement recommended More accurate, effect is also more preferable.

The foregoing is only presently preferred embodiments of the present invention, not in order to limit the present invention, all essences in the present invention Any modification, equivalent and improvement made within god and principle etc., should be included within the scope of the present invention.

Claims

1. a kind of advertisement based on microblogging recommends method it is characterised in that methods described comprises the steps：

Read the microblog data of user；

The microblog data that initialization is read, to obtain microblog text lexical item set, the microblog data that described initialization is read includes Special symbol in the microblog data that removal is read, non-Chinese character, participle；

Described microblogging text primitive character lexical item set is mapped with the feature lexical item dictionary previously generating, is judged described micro- Lexical item in this original feature lexical item set of blog article whether occur in described in the feature lexical item dictionary that previously generates, and calculate The word frequency of the lexical item in described microblogging text primitive character lexical item set in the now described feature lexical item dictionary previously generating- Reverse document-frequency tf-idf value, using as the described microblogging literary composition in the feature lexical item dictionary previously generating described in described occurring in Lexical item in this original feature lexical item set is in the eigenvalue of microblogging；

Whether the lexical item of the feature lexical item dictionary previously generating described in judgement occurs in described microblogging text primitive character lexical item collection In conjunction, and the feature lexical item dictionary previously generating described in not appearing in described microblogging text primitive character lexical item set The eigenvalue of lexical item be labeled as 0；

With the result of automatic classification as foundation, to the user's recommended advertisements reading microblog data；

As foundation, the step to the user's recommended advertisements reading microblog data specifically includes the described result with automatic classification：

Every percentage ratio shared by class microblogging in the microblogging of counting user；

By the percentage ratio shared by every class microblogging of statistics, the label in microblogging data is mated with user, and by the match is successful The percentage ratio shared by classification double；

Recommend the advertisement of the M classification in front M for the ranking to the user reading microblog data, described M is integer, M is more than 0.

2. the method for claim 1 is it is characterised in that use the disaggregated model being previously obtained that user is micro- described Rich data automatic classification comprises the steps before in the classification dividing in advance：

Read training microblogging；

It the training microblog data handmarking of described reading is the classification dividing in advance；

The training microblog data that initialization is read, to obtain microblog text lexical item set, the training microblogging that described initialization is read Data include remove read training microblog data in special symbol, non-Chinese character, participle；

Generate feature lexical item dictionary；

Described microblogging text primitive character lexical item set is mapped with described feature lexical item dictionary, is judged described microblogging text Whether the lexical item in primitive character lexical item set occurs in described feature lexical item dictionary, and calculates and occur in described feature lexical item The tf-idf value of the lexical item in described microblogging text primitive character lexical item set in dictionary, to occur in described spy as described Levy the eigenvalue in microblogging for the lexical item in the described microblogging text primitive character lexical item set in lexical item dictionary；

Judge whether the lexical item of described feature lexical item dictionary occurs in described microblogging text primitive character lexical item set, and will not have The eigenvalue occurring the lexical item of the described feature lexical item dictionary in described microblogging text primitive character lexical item set is labeled as 0；

The characteristic vector being formed using the calculated all eigenvalues of default Algorithm for Training, to obtain disaggregated model.

3. method as claimed in claim 2 is it is characterised in that the step of described generation feature lexical item dictionary specifically includes：

Calculate the association relationship of each lexical item in microblogging text primitive character lexical item set；

Choose association relationship ranking front N N number of lexical item as feature lexical item dictionary lexical item, described N be integer, N be more than 0.

4. method as claimed in claim 2 is it is characterised in that described default algorithm includes any one algorithm following：

5. a kind of advertisement commending system based on microblogging is it is characterised in that described system includes：

First data reads in module, for reading the microblog data of user；

First data initialization module, the microblog data reading for initialization, to obtain microblog text lexical item set, described first Beginningization read microblog data include remove read microblog data in special symbol, non-Chinese character, participle；

Fisrt feature extraction module, for deleting the stop words of described microblog text lexical item set, original to obtain microblogging text Feature lexical item set；

First eigenvector module, for by described microblogging text primitive character lexical item set and the feature lexical item previously generating Dictionary is mapped, and judges lexical item in described microblogging text primitive character lexical item set previously generates described in whether occurring in In feature lexical item dictionary, and calculate occur in described in described microblogging text primitive character in the feature lexical item dictionary that previously generates The tf-idf value of the lexical item in lexical item set, using as in the feature lexical item dictionary previously generating described in described occurring in described in Lexical item in microblogging text primitive character lexical item set is in the eigenvalue of microblogging；And the feature for previously generating described in judging Whether the lexical item of lexical item dictionary occurs in described microblogging text primitive character lexical item set, and will not appear in described microblogging The eigenvalue of the lexical item of feature lexical item dictionary previously generating described in text primitive character lexical item set is labeled as 0；

Sort module, for being automatically categorized into, by the microblog data of user, the class dividing in advance using the disaggregated model being previously obtained In not；

Recommending module, for the result of automatic classification as foundation, to the user's recommended advertisements reading microblog data；

Described recommending module includes：

Data statistics module, for the percentage ratio shared by class microblogging every in the microblogging of counting user；

Data match module, for by the percentage ratio shared by every class microblogging of statistics, the label in microblogging data is carried out with user Coupling, and will be double for the percentage ratio shared by the classification that the match is successful；

Advertisement recommending module, for recommending the advertisement of the M classification in front M for the ranking to the user reading microblog data, described M is Integer, M is more than 0.

6. system as claimed in claim 5 is it is characterised in that described system also includes：

Second data reads in module, for reading training microblog data；

Manual sort's module, for being the classification dividing in advance by the training microblog data handmarking of described reading；

Second data initialization module, the training microblog data reading for initialization, to obtain microblog text lexical item set, institute State initialization read training microblog data include remove reading training microblog data in special symbol, non-Chinese character, Participle；

Second feature extraction module, for deleting the stop words of described microblog text lexical item set, original to obtain microblogging text Feature lexical item set；

Feature lexical item dictionary generation module, for generating feature lexical item dictionary；

Second feature vectorization module, for entering described microblogging text primitive character lexical item set with described feature lexical item dictionary Row mapping, judges whether the lexical item in described microblogging text primitive character lexical item set occurs in described feature lexical item dictionary, And calculate the tf-idf of the lexical item in the described microblogging text primitive character lexical item set occurring in described feature lexical item dictionary Value, to exist as the lexical item in the described described microblogging text primitive character lexical item set occurring in described feature lexical item dictionary The eigenvalue of microblogging；And for judging whether the lexical item of described feature lexical item dictionary occurs in described microblogging text primitive character In lexical item set, and the word by the described feature lexical item dictionary not appeared in described microblogging text primitive character lexical item set The eigenvalue of item is labeled as 0；

Training module, for the characteristic vector being formed using the calculated all eigenvalues of default Algorithm for Training, to obtain Obtain disaggregated model.

7. system as claimed in claim 6 is it is characterised in that described feature lexical item dictionary generation module includes：

Association relationship computing module, for calculating the association relationship of each lexical item in microblogging text primitive character lexical item set；

Feature lexical item dictionary lexical item selecting module, for choosing N number of lexical item in front N for the association relationship ranking as feature lexical item word The lexical item of allusion quotation, described N is integer, and N is more than 0.

8. system as claimed in claim 6 is it is characterised in that described default algorithm includes any one algorithm following：