CN108346067A - Social networks advertisement sending method based on natural language processing - Google Patents
Social networks advertisement sending method based on natural language processing Download PDFInfo
- Publication number
- CN108346067A CN108346067A CN201810063384.XA CN201810063384A CN108346067A CN 108346067 A CN108346067 A CN 108346067A CN 201810063384 A CN201810063384 A CN 201810063384A CN 108346067 A CN108346067 A CN 108346067A
- Authority
- CN
- China
- Prior art keywords
- user
- portrait
- vector
- commodity
- social networks
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0241—Advertisements
- G06Q30/0251—Targeted advertisements
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
- G06Q50/01—Social networking
Abstract
The invention discloses a kind of social networks advertisement sending method based on natural language processing includes the social network data for obtaining user;Social network data is segmented, term vector is generated;The multiple prediction models of training generate user's portrait for predicting different user properties;User is drawn a portrait and generates user's portrait vector, commodity are generated into commodity vector according to different dimensions;The high commodity of the high user's clicking rate of user's portrait vector similarity, calculating and the high commodity of the high commodity vector similarity of clicking rate are calculated, and recommend the high user of user's portrait similarity.This method is simple and efficient, and user's portrait can be automatically generated according to social network information, improves the accuracy and clicking rate of advertisement dispensing.
Description
Technical field
The present invention relates to a kind of advertisement sending method of social networks, more particularly to a kind of based on natural language processing
Social networks advertisement sending method.
Background technology
Global interconnection network users total amount has surpassed 3,000,000,000 at present, and social networks is even more to have development time long, number of users
The features such as huge.Wherein, as the types of facial makeup in Beijing operas (facebook) moon any active ues reach 13.5 hundred million, close to Chinese population total quantity.China
Interactive web site QQ space active account under internet giant Tencent is up to 6.45 hundred million, the numbers of users such as wechat circle of friends
It is huge.
The user platform of such enormous amount no matter be all itself or for users to platform merchandising very
Good platform, because in social networks such as QQ space, circle of friends, microbloggings, people like delivering word, picture, positioning, share certainly
Oneself hobby, what is seen and heard reprint interesting article etc..Previous platform advertisement pushing is excessively rough, only simply
Recommended based on gender, age, Query Information on ground.This algorithm be based on natural language processing to the social networks network user into
Row analysis, predicts personal essential information, such as the age, income, occupation, whether have carry out again after caravan, hobby it is precisely wide
Push is accused, everyone advertisement pushing has personalization, hit rate and probability of transaction is made to have bigger promotion.This technology also may be used
Advertisement and distribution are sent in social networks for those businessmans, are promoted according to different user after predicting personal user's portrait
Different commodity, the commodity of people's purchase of same subscriber portrait also can similarity-rough set it is high, such as 20-30 Sui unmarried be engaged in finance
The comparison that the women of work may buy high-grade lipstick is more, then it is identical and similar that the people of close user portrait can be given to recommend
Commodity.
It is the one of training classification using SVM (latent dirichlet allocation) models as the machine mould of representative
The technology of a comparative maturity, be that the AT&TBell laboratory researches group led by Vanpik proposed in 1963 is a kind of new
Very potential sorting technique.SVM is a kind of mode identification method based on Statistical Learning Theory, is mainly used in pattern knowledge
Other field.The user information gone out using SVM modeling minings is very helpful for advertisement pushing.SVM models are in classification, number
According to being used widely in excavation and related field, and aspect obtains very ten-strike in natural language processing.
Invention content
For the above technical problems, purpose of the present invention is to:Provide a kind of social activity based on natural language processing
Network advertisement push method, this method are simple and efficient, and user's portrait can be automatically generated according to social network information, improves advertisement
The accuracy and clicking rate of dispensing.
The technical scheme is that:
A kind of social networks advertisement sending method based on natural language processing, includes the following steps:
S01:Obtain the social network data of user;
S02:Social network data is segmented, term vector is generated;
S03:The multiple prediction models of training generate user's portrait for predicting different user properties;
S04:User is drawn a portrait and generates user's portrait vector, commodity are generated into commodity vector according to different dimensions;
S05:The high commodity of the high user's clicking rate of user's portrait vector similarity are calculated, are calculated and the high commodity of clicking rate
The high commodity of vector similarity, and recommend the high user of user's portrait similarity.
Preferably, the step S01 is specifically included:
The social network data for crawling or directly reading user every the set time by reptile, to social network data
It is pre-processed, retains positioning, original and forwarding sentence information.
Preferably, the step S02 is specifically included:
S21:Social network data is further processed, regular expression removal pure digi-tal sentence, pure phonetic are used
Sentence, pure signal statement, meaningless emoticon, obtain social networks text data;
S22:Essential information dictionary for word segmentation is created, participle library and deactivated dictionary are established;
S22:Cutting word processing is carried out to social networks text data, the stop words in text after removal participle;
S24:Text training term vector after being segmented with a large number of users generates term vector;
Preferably, different user properties is predicted in the step S03, including:
Svm classifier model and logistic models are trained using term vector, the svm classifier model prediction completed using training
Two categorical attributes, the more categorical attributes of logistic model predictions completed using training.
Preferably, the step S04 is specifically included:
S41:The attribute that different users draws a portrait is substituted for number, generates user's portrait vector;
S42:Commodity are generated into commodity vector according to major class, name of product, brand, brand product four dimensions.
Preferably, the portrait vector similarity height of user described in the step S05 and commodity vector similarity high pass meter
The distance calculated between different vectors obtains, if distance is less than given threshold, judges that similarity height, calculation formula are:
In formula, x1kAnd x2kRespectively vectorial, k is the dimension of vector.
Compared with prior art, it is an advantage of the invention that:
1, social network information is analyzed by natural language processing, can understand the information of user in more detail, generate compared with
For detailed user portrait information.
2, the user's portrait excavated analyzes other business, such as consumer's risk, and software is promoted, and sale etc. is all great
Value.
3, it is drawn a portrait Recommendations according to user, more accurately, more personalized improves the clicking rate and commodity of advertisement
Buying rate.
Description of the drawings
The invention will be further described with reference to the accompanying drawings and embodiments:
Fig. 1 is the flow chart of the method for the present invention;
Fig. 2 is that social networks of the present invention crawls flow chart;
Fig. 3 is that social networks personal user of the present invention portrait excavates flow chart;
Fig. 4 is advertisement recommended flowsheet figure of the present invention.
Specific implementation mode
Said program is described further below in conjunction with specific embodiment.It should be understood that these embodiments are for illustrating
The present invention and be not limited to limit the scope of the invention.The implementation condition used in embodiment can be done according to the condition of specific producer
Further adjustment, the implementation condition being not specified is usually the condition in routine experiment.
Embodiment:
The social networks of the social networks advertisement sending method based on natural language processing of the present invention refers to wechat, microblogging
Etc. speaking for oneself, records and live, the social platform of reprinted articles.Content of text includes the positioning of user, it is original to deliver
The words such as word, the article title of reprinting and public number.User's Similarity measures refer to the distance between user's portrait vector.
As shown in Figure 1, the social networks advertisement sending method based on natural language processing of the present invention, including following step
Suddenly:
One, social networks is crawled and is pre-processed
As shown in Fig. 2, after social networks crawls data, a timer is set, when being more than 30 days the time, is crawled most
New one month social networks dynamic, is saved in database.Timer is classified as 0 again again, one day timer is often crossed and adds 1,
When equal timers are 30, crawl again.
After obtaining original social network data, it is pre-processed, deletion is meaningless to thumb up information, comment information,
Picture, video etc., pretreatment here can be programmed using Python.
Two, social networks personal information is excavated
As shown in figure 3, social network data is originally further processed according to social network information feature, canonical is used
Expression formula removes pure digi-tal sentence, pure phonetic sentence, pure signal statement, meaningless emoticon, obtains the society of pure Chinese-character text
Hand over network text data.
Then Chinese Word Segmentation carried out to it using the Python jieba Words partition systems carried, go the operations such as stop words, especially
Ground, this step needs to use customized participle library and deactivated dictionary, to improve accuracy rate.
Participle library and deactivated dictionary can be created according to priori.
Text training term vector after being segmented to a large number of users with the packet fasttext.cbow of Python, generates term vector
model。
Then prediction model, different user properties is trained differently to predict.Gender, age are two categorical attributes
It is predicted with SVM training patterns, the logistic model predictions of more categorical attributes.
The term vector of training set is put into SVM models to be trained, an attribute is predicted every time, adjusts parameter appropriate,
It exports and tends towards stability after iterative process several times, each attribute of user's portrait can be obtained.Then with trained mould
Type predicts target user's portrait.
The parameter of the present embodiment SVM models is set as:test_size:Sample accounting is 0.8, chooses linear kernel function, repeatedly
Generation number is 500 times.
It for logistic models, needs rule of thumb to do a small amount of seed words under the classification of prediction, such as occupation is pre-
The seed words for surveying Program person can be that Java, programming, programmer, debug etc., then the calculating of word vector distance find out close word
Complete classified dictionary is synthesized, counts the number that word occurs in each classification dictionary in social networks cutting word, then number is used
It is returned in logistic, parameter is set as:Seed vocabulary number under each attribute is set as 100.
The model of this experiment is python versions, and running environment is (SuSE) Linux OS.
Three, user's portrait and commodity are converted into vector
The attribute that different users draws a portrait is substituted for number, generates user's portrait vector;
Commodity are generated into commodity vector according to major class, name of product, brand, brand product four dimensions.
Four, advertisement pushing strategy
As shown in figure 4, calculating the similitude of target user and other users' portrait vectors first, user draws a portrait vectorial
Similitude, the present embodiment calculate the distance between different user vector, two n using Euclidean distance (Euclidean Distance)
Euclidean distance between dimensional vector A (x11, x12, x1n) and B (x21, x22, x2n), calculation formula are as follows
It is shown:
In formula, it is more similar to represent two users closer to 0 for Dlk distances.
Threshold value is set, if similarity is less than or equal to threshold value, two user informations of judgement are similar;
Collect a certain number of similar users, such as 2000.
Then calculate the high a certain number of commodity of clicking rate of similar users, each commodity have the commodity of oneself to
Amount.
The similarity of commodity vector is calculated, similarity here is calculated also by above-mentioned formula.
Threshold value is set, and the threshold value of the present embodiment is set as 2, if similarity is less than or equal to threshold value, judges two commodity phases
Seemingly.
Obtain similarity less than 2 highest 5 advertisements of the recent month clicking rate of 2000 users commodity and these
The similar commodity of commodity, recommend target user together.
The foregoing examples are merely illustrative of the technical concept and features of the invention, its object is to allow the person skilled in the art to be
It cans understand the content of the present invention and implement it accordingly, it is not intended to limit the scope of the present invention.It is all smart according to the present invention
The equivalent transformation or modification that refreshing essence is done, should be covered by the protection scope of the present invention.
Claims (6)
1. a kind of social networks advertisement sending method based on natural language processing, which is characterized in that include the following steps:
S01:Obtain the social network data of user;
S02:Social network data is segmented, term vector is generated;
S03:The multiple prediction models of training generate user's portrait for predicting different user properties;
S04:User is drawn a portrait and generates user's portrait vector, commodity are generated into commodity vector according to different dimensions;
S05:The high commodity of the high user's clicking rate of user's portrait vector similarity are calculated, the commodity vector high with clicking rate is calculated
The high commodity of similarity, and recommend the high user of user's portrait similarity.
2. the social networks advertisement sending method according to claim 1 based on natural language processing, which is characterized in that institute
Step S01 is stated to specifically include:
The social network data for crawling or directly reading user every the set time by reptile carries out social network data
Pretreatment retains positioning, original and forwarding sentence information.
3. the social networks advertisement sending method according to claim 1 based on natural language processing, which is characterized in that institute
Step S02 is stated to specifically include:
S21:Social network data is further processed, using regular expression removal pure digi-tal sentence, pure phonetic sentence,
Pure signal statement, meaningless emoticon, obtain social networks text data;
S22:Essential information dictionary for word segmentation is created, participle library and deactivated dictionary are established;
S22:Cutting word processing is carried out to social networks text data, the stop words in text after removal participle;
S24:Text training term vector after being segmented with a large number of users generates term vector.
4. the social networks advertisement sending method according to claim 1 based on natural language processing, which is characterized in that institute
It states and predicts different user properties in step S03, including:
Svm classifier model and logistic models are trained using term vector, the svm classifier model prediction two completed using training is divided
Generic attribute, the more categorical attributes of logistic model predictions completed using training.
5. the social networks advertisement sending method according to claim 1 based on natural language processing, which is characterized in that institute
Step S04 is stated to specifically include:
S41:The attribute that different users draws a portrait is substituted for number, generates user's portrait vector;
S42:Commodity are generated into commodity vector according to major class, name of product, brand, brand product four dimensions.
6. the social networks advertisement sending method according to claim 1 based on natural language processing, which is characterized in that institute
State that user described in step S05 draws a portrait that vector similarity is high and commodity vector similarity high pass calculates distance between different vectors
It obtains, if distance is less than given threshold, judges that similarity height, calculation formula are:
In formula, x1kAnd x2kRespectively vectorial, k is the dimension of vector.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810063384.XA CN108346067A (en) | 2018-01-23 | 2018-01-23 | Social networks advertisement sending method based on natural language processing |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810063384.XA CN108346067A (en) | 2018-01-23 | 2018-01-23 | Social networks advertisement sending method based on natural language processing |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108346067A true CN108346067A (en) | 2018-07-31 |
Family
ID=62960867
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810063384.XA Pending CN108346067A (en) | 2018-01-23 | 2018-01-23 | Social networks advertisement sending method based on natural language processing |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108346067A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109523116A (en) * | 2018-10-11 | 2019-03-26 | 平安科技(深圳)有限公司 | Business risk analysis method, device, computer equipment and storage medium |
CN110428295A (en) * | 2018-08-01 | 2019-11-08 | 北京京东尚科信息技术有限公司 | Method of Commodity Recommendation and system |
CN110781389A (en) * | 2019-10-18 | 2020-02-11 | 支付宝(杭州)信息技术有限公司 | Method and system for generating recommendations for a user |
CN111428112A (en) * | 2020-03-26 | 2020-07-17 | 上海浩方信息技术有限公司 | Method for crawler retrieval and big data intelligent recommendation optimization processing based on open source framework |
CN114422835A (en) * | 2021-12-29 | 2022-04-29 | 上海数即数据科技有限公司 | Advertisement directional promotion platform based on big data analysis |
CN114491051A (en) * | 2022-04-02 | 2022-05-13 | 四川省大数据中心 | Project approval system for building site |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105095256A (en) * | 2014-05-07 | 2015-11-25 | 阿里巴巴集团控股有限公司 | Information push method and apparatus based on similarity degree between users |
CN105678590A (en) * | 2016-02-07 | 2016-06-15 | 重庆邮电大学 | topN recommendation method for social network based on cloud model |
CN106204156A (en) * | 2016-07-20 | 2016-12-07 | 天涯社区网络科技股份有限公司 | A kind of advertisement placement method for network forum and device |
CN106530001A (en) * | 2016-11-03 | 2017-03-22 | 广州市万表科技股份有限公司 | Information recommending method and apparatus |
-
2018
- 2018-01-23 CN CN201810063384.XA patent/CN108346067A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105095256A (en) * | 2014-05-07 | 2015-11-25 | 阿里巴巴集团控股有限公司 | Information push method and apparatus based on similarity degree between users |
CN105678590A (en) * | 2016-02-07 | 2016-06-15 | 重庆邮电大学 | topN recommendation method for social network based on cloud model |
CN106204156A (en) * | 2016-07-20 | 2016-12-07 | 天涯社区网络科技股份有限公司 | A kind of advertisement placement method for network forum and device |
CN106530001A (en) * | 2016-11-03 | 2017-03-22 | 广州市万表科技股份有限公司 | Information recommending method and apparatus |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110428295A (en) * | 2018-08-01 | 2019-11-08 | 北京京东尚科信息技术有限公司 | Method of Commodity Recommendation and system |
CN109523116A (en) * | 2018-10-11 | 2019-03-26 | 平安科技(深圳)有限公司 | Business risk analysis method, device, computer equipment and storage medium |
CN110781389A (en) * | 2019-10-18 | 2020-02-11 | 支付宝(杭州)信息技术有限公司 | Method and system for generating recommendations for a user |
CN111428112A (en) * | 2020-03-26 | 2020-07-17 | 上海浩方信息技术有限公司 | Method for crawler retrieval and big data intelligent recommendation optimization processing based on open source framework |
CN114422835A (en) * | 2021-12-29 | 2022-04-29 | 上海数即数据科技有限公司 | Advertisement directional promotion platform based on big data analysis |
CN114491051A (en) * | 2022-04-02 | 2022-05-13 | 四川省大数据中心 | Project approval system for building site |
CN114491051B (en) * | 2022-04-02 | 2022-07-29 | 四川省大数据中心 | Project approval system for building site |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108346067A (en) | Social networks advertisement sending method based on natural language processing | |
EP3779841B1 (en) | Method, apparatus and system for sending information, and computer-readable storage medium | |
US8873813B2 (en) | Application of Z-webs and Z-factors to analytics, search engine, learning, recognition, natural language, and other utilities | |
US8903198B2 (en) | Image ranking based on attribute correlation | |
US20180158078A1 (en) | Computer device and method for predicting market demand of commodities | |
CN107357793B (en) | Information recommendation method and device | |
CN113168586A (en) | Text classification and management | |
US20220405607A1 (en) | Method for obtaining user portrait and related apparatus | |
US20140201126A1 (en) | Methods and Systems for Applications for Z-numbers | |
CN106447066A (en) | Big data feature extraction method and device | |
CN105574067A (en) | Item recommendation device and item recommendation method | |
CN106445988A (en) | Intelligent big data processing method and system | |
WO2018040069A1 (en) | Information recommendation system and method | |
CN106250513A (en) | A kind of event personalization sorting technique based on event modeling and system | |
CN106682686A (en) | User gender prediction method based on mobile phone Internet-surfing behavior | |
Yuan et al. | Sentiment analysis using social multimedia | |
CN110737811A (en) | Application classification method and device and related equipment | |
CN110705304A (en) | Attribute word extraction method | |
CN112966010A (en) | User track information mining method | |
CN112749330A (en) | Information pushing method and device, computer equipment and storage medium | |
CN113656699B (en) | User feature vector determining method, related equipment and medium | |
CN116823410B (en) | Data processing method, object processing method, recommending method and computing device | |
CN107070702B (en) | User account correlation method and device based on cooperative game support vector machine | |
US20230351473A1 (en) | Apparatus and method for providing user's interior style analysis model on basis of sns text | |
Choi et al. | Fake review identification and utility evaluation model using machine learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180731 |
|
RJ01 | Rejection of invention patent application after publication |