CN108346067A - Social networks advertisement sending method based on natural language processing - Google Patents

Social networks advertisement sending method based on natural language processing Download PDF

Info

Publication number
CN108346067A
CN108346067A CN201810063384.XA CN201810063384A CN108346067A CN 108346067 A CN108346067 A CN 108346067A CN 201810063384 A CN201810063384 A CN 201810063384A CN 108346067 A CN108346067 A CN 108346067A
Authority
CN
China
Prior art keywords
user
portrait
vector
commodity
social networks
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810063384.XA
Other languages
Chinese (zh)
Inventor
杨威
刘艳
黄刘生
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Institute for Advanced Study USTC
Original Assignee
Suzhou Institute for Advanced Study USTC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Institute for Advanced Study USTC filed Critical Suzhou Institute for Advanced Study USTC
Priority to CN201810063384.XA priority Critical patent/CN108346067A/en
Publication of CN108346067A publication Critical patent/CN108346067A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0251Targeted advertisements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Abstract

The invention discloses a kind of social networks advertisement sending method based on natural language processing includes the social network data for obtaining user;Social network data is segmented, term vector is generated;The multiple prediction models of training generate user's portrait for predicting different user properties;User is drawn a portrait and generates user's portrait vector, commodity are generated into commodity vector according to different dimensions;The high commodity of the high user's clicking rate of user's portrait vector similarity, calculating and the high commodity of the high commodity vector similarity of clicking rate are calculated, and recommend the high user of user's portrait similarity.This method is simple and efficient, and user's portrait can be automatically generated according to social network information, improves the accuracy and clicking rate of advertisement dispensing.

Description

Social networks advertisement sending method based on natural language processing
Technical field
The present invention relates to a kind of advertisement sending method of social networks, more particularly to a kind of based on natural language processing Social networks advertisement sending method.
Background technology
Global interconnection network users total amount has surpassed 3,000,000,000 at present, and social networks is even more to have development time long, number of users The features such as huge.Wherein, as the types of facial makeup in Beijing operas (facebook) moon any active ues reach 13.5 hundred million, close to Chinese population total quantity.China Interactive web site QQ space active account under internet giant Tencent is up to 6.45 hundred million, the numbers of users such as wechat circle of friends It is huge.
The user platform of such enormous amount no matter be all itself or for users to platform merchandising very Good platform, because in social networks such as QQ space, circle of friends, microbloggings, people like delivering word, picture, positioning, share certainly Oneself hobby, what is seen and heard reprint interesting article etc..Previous platform advertisement pushing is excessively rough, only simply Recommended based on gender, age, Query Information on ground.This algorithm be based on natural language processing to the social networks network user into Row analysis, predicts personal essential information, such as the age, income, occupation, whether have carry out again after caravan, hobby it is precisely wide Push is accused, everyone advertisement pushing has personalization, hit rate and probability of transaction is made to have bigger promotion.This technology also may be used Advertisement and distribution are sent in social networks for those businessmans, are promoted according to different user after predicting personal user's portrait Different commodity, the commodity of people's purchase of same subscriber portrait also can similarity-rough set it is high, such as 20-30 Sui unmarried be engaged in finance The comparison that the women of work may buy high-grade lipstick is more, then it is identical and similar that the people of close user portrait can be given to recommend Commodity.
It is the one of training classification using SVM (latent dirichlet allocation) models as the machine mould of representative The technology of a comparative maturity, be that the AT&TBell laboratory researches group led by Vanpik proposed in 1963 is a kind of new Very potential sorting technique.SVM is a kind of mode identification method based on Statistical Learning Theory, is mainly used in pattern knowledge Other field.The user information gone out using SVM modeling minings is very helpful for advertisement pushing.SVM models are in classification, number According to being used widely in excavation and related field, and aspect obtains very ten-strike in natural language processing.
Invention content
For the above technical problems, purpose of the present invention is to:Provide a kind of social activity based on natural language processing Network advertisement push method, this method are simple and efficient, and user's portrait can be automatically generated according to social network information, improves advertisement The accuracy and clicking rate of dispensing.
The technical scheme is that:
A kind of social networks advertisement sending method based on natural language processing, includes the following steps:
S01:Obtain the social network data of user;
S02:Social network data is segmented, term vector is generated;
S03:The multiple prediction models of training generate user's portrait for predicting different user properties;
S04:User is drawn a portrait and generates user's portrait vector, commodity are generated into commodity vector according to different dimensions;
S05:The high commodity of the high user's clicking rate of user's portrait vector similarity are calculated, are calculated and the high commodity of clicking rate The high commodity of vector similarity, and recommend the high user of user's portrait similarity.
Preferably, the step S01 is specifically included:
The social network data for crawling or directly reading user every the set time by reptile, to social network data It is pre-processed, retains positioning, original and forwarding sentence information.
Preferably, the step S02 is specifically included:
S21:Social network data is further processed, regular expression removal pure digi-tal sentence, pure phonetic are used Sentence, pure signal statement, meaningless emoticon, obtain social networks text data;
S22:Essential information dictionary for word segmentation is created, participle library and deactivated dictionary are established;
S22:Cutting word processing is carried out to social networks text data, the stop words in text after removal participle;
S24:Text training term vector after being segmented with a large number of users generates term vector;
Preferably, different user properties is predicted in the step S03, including:
Svm classifier model and logistic models are trained using term vector, the svm classifier model prediction completed using training Two categorical attributes, the more categorical attributes of logistic model predictions completed using training.
Preferably, the step S04 is specifically included:
S41:The attribute that different users draws a portrait is substituted for number, generates user's portrait vector;
S42:Commodity are generated into commodity vector according to major class, name of product, brand, brand product four dimensions.
Preferably, the portrait vector similarity height of user described in the step S05 and commodity vector similarity high pass meter The distance calculated between different vectors obtains, if distance is less than given threshold, judges that similarity height, calculation formula are:
In formula, x1kAnd x2kRespectively vectorial, k is the dimension of vector.
Compared with prior art, it is an advantage of the invention that:
1, social network information is analyzed by natural language processing, can understand the information of user in more detail, generate compared with For detailed user portrait information.
2, the user's portrait excavated analyzes other business, such as consumer's risk, and software is promoted, and sale etc. is all great Value.
3, it is drawn a portrait Recommendations according to user, more accurately, more personalized improves the clicking rate and commodity of advertisement Buying rate.
Description of the drawings
The invention will be further described with reference to the accompanying drawings and embodiments:
Fig. 1 is the flow chart of the method for the present invention;
Fig. 2 is that social networks of the present invention crawls flow chart;
Fig. 3 is that social networks personal user of the present invention portrait excavates flow chart;
Fig. 4 is advertisement recommended flowsheet figure of the present invention.
Specific implementation mode
Said program is described further below in conjunction with specific embodiment.It should be understood that these embodiments are for illustrating The present invention and be not limited to limit the scope of the invention.The implementation condition used in embodiment can be done according to the condition of specific producer Further adjustment, the implementation condition being not specified is usually the condition in routine experiment.
Embodiment:
The social networks of the social networks advertisement sending method based on natural language processing of the present invention refers to wechat, microblogging Etc. speaking for oneself, records and live, the social platform of reprinted articles.Content of text includes the positioning of user, it is original to deliver The words such as word, the article title of reprinting and public number.User's Similarity measures refer to the distance between user's portrait vector.
As shown in Figure 1, the social networks advertisement sending method based on natural language processing of the present invention, including following step Suddenly:
One, social networks is crawled and is pre-processed
As shown in Fig. 2, after social networks crawls data, a timer is set, when being more than 30 days the time, is crawled most New one month social networks dynamic, is saved in database.Timer is classified as 0 again again, one day timer is often crossed and adds 1, When equal timers are 30, crawl again.
After obtaining original social network data, it is pre-processed, deletion is meaningless to thumb up information, comment information, Picture, video etc., pretreatment here can be programmed using Python.
Two, social networks personal information is excavated
As shown in figure 3, social network data is originally further processed according to social network information feature, canonical is used Expression formula removes pure digi-tal sentence, pure phonetic sentence, pure signal statement, meaningless emoticon, obtains the society of pure Chinese-character text Hand over network text data.
Then Chinese Word Segmentation carried out to it using the Python jieba Words partition systems carried, go the operations such as stop words, especially Ground, this step needs to use customized participle library and deactivated dictionary, to improve accuracy rate.
Participle library and deactivated dictionary can be created according to priori.
Text training term vector after being segmented to a large number of users with the packet fasttext.cbow of Python, generates term vector model。
Then prediction model, different user properties is trained differently to predict.Gender, age are two categorical attributes It is predicted with SVM training patterns, the logistic model predictions of more categorical attributes.
The term vector of training set is put into SVM models to be trained, an attribute is predicted every time, adjusts parameter appropriate, It exports and tends towards stability after iterative process several times, each attribute of user's portrait can be obtained.Then with trained mould Type predicts target user's portrait.
The parameter of the present embodiment SVM models is set as:test_size:Sample accounting is 0.8, chooses linear kernel function, repeatedly Generation number is 500 times.
It for logistic models, needs rule of thumb to do a small amount of seed words under the classification of prediction, such as occupation is pre- The seed words for surveying Program person can be that Java, programming, programmer, debug etc., then the calculating of word vector distance find out close word Complete classified dictionary is synthesized, counts the number that word occurs in each classification dictionary in social networks cutting word, then number is used It is returned in logistic, parameter is set as:Seed vocabulary number under each attribute is set as 100.
The model of this experiment is python versions, and running environment is (SuSE) Linux OS.
Three, user's portrait and commodity are converted into vector
The attribute that different users draws a portrait is substituted for number, generates user's portrait vector;
Commodity are generated into commodity vector according to major class, name of product, brand, brand product four dimensions.
Four, advertisement pushing strategy
As shown in figure 4, calculating the similitude of target user and other users' portrait vectors first, user draws a portrait vectorial Similitude, the present embodiment calculate the distance between different user vector, two n using Euclidean distance (Euclidean Distance) Euclidean distance between dimensional vector A (x11, x12, x1n) and B (x21, x22, x2n), calculation formula are as follows It is shown:
In formula, it is more similar to represent two users closer to 0 for Dlk distances.
Threshold value is set, if similarity is less than or equal to threshold value, two user informations of judgement are similar;
Collect a certain number of similar users, such as 2000.
Then calculate the high a certain number of commodity of clicking rate of similar users, each commodity have the commodity of oneself to Amount.
The similarity of commodity vector is calculated, similarity here is calculated also by above-mentioned formula.
Threshold value is set, and the threshold value of the present embodiment is set as 2, if similarity is less than or equal to threshold value, judges two commodity phases Seemingly.
Obtain similarity less than 2 highest 5 advertisements of the recent month clicking rate of 2000 users commodity and these The similar commodity of commodity, recommend target user together.
The foregoing examples are merely illustrative of the technical concept and features of the invention, its object is to allow the person skilled in the art to be It cans understand the content of the present invention and implement it accordingly, it is not intended to limit the scope of the present invention.It is all smart according to the present invention The equivalent transformation or modification that refreshing essence is done, should be covered by the protection scope of the present invention.

Claims (6)

1. a kind of social networks advertisement sending method based on natural language processing, which is characterized in that include the following steps:
S01:Obtain the social network data of user;
S02:Social network data is segmented, term vector is generated;
S03:The multiple prediction models of training generate user's portrait for predicting different user properties;
S04:User is drawn a portrait and generates user's portrait vector, commodity are generated into commodity vector according to different dimensions;
S05:The high commodity of the high user's clicking rate of user's portrait vector similarity are calculated, the commodity vector high with clicking rate is calculated The high commodity of similarity, and recommend the high user of user's portrait similarity.
2. the social networks advertisement sending method according to claim 1 based on natural language processing, which is characterized in that institute Step S01 is stated to specifically include:
The social network data for crawling or directly reading user every the set time by reptile carries out social network data Pretreatment retains positioning, original and forwarding sentence information.
3. the social networks advertisement sending method according to claim 1 based on natural language processing, which is characterized in that institute Step S02 is stated to specifically include:
S21:Social network data is further processed, using regular expression removal pure digi-tal sentence, pure phonetic sentence, Pure signal statement, meaningless emoticon, obtain social networks text data;
S22:Essential information dictionary for word segmentation is created, participle library and deactivated dictionary are established;
S22:Cutting word processing is carried out to social networks text data, the stop words in text after removal participle;
S24:Text training term vector after being segmented with a large number of users generates term vector.
4. the social networks advertisement sending method according to claim 1 based on natural language processing, which is characterized in that institute It states and predicts different user properties in step S03, including:
Svm classifier model and logistic models are trained using term vector, the svm classifier model prediction two completed using training is divided Generic attribute, the more categorical attributes of logistic model predictions completed using training.
5. the social networks advertisement sending method according to claim 1 based on natural language processing, which is characterized in that institute Step S04 is stated to specifically include:
S41:The attribute that different users draws a portrait is substituted for number, generates user's portrait vector;
S42:Commodity are generated into commodity vector according to major class, name of product, brand, brand product four dimensions.
6. the social networks advertisement sending method according to claim 1 based on natural language processing, which is characterized in that institute State that user described in step S05 draws a portrait that vector similarity is high and commodity vector similarity high pass calculates distance between different vectors It obtains, if distance is less than given threshold, judges that similarity height, calculation formula are:
In formula, x1kAnd x2kRespectively vectorial, k is the dimension of vector.
CN201810063384.XA 2018-01-23 2018-01-23 Social networks advertisement sending method based on natural language processing Pending CN108346067A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810063384.XA CN108346067A (en) 2018-01-23 2018-01-23 Social networks advertisement sending method based on natural language processing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810063384.XA CN108346067A (en) 2018-01-23 2018-01-23 Social networks advertisement sending method based on natural language processing

Publications (1)

Publication Number Publication Date
CN108346067A true CN108346067A (en) 2018-07-31

Family

ID=62960867

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810063384.XA Pending CN108346067A (en) 2018-01-23 2018-01-23 Social networks advertisement sending method based on natural language processing

Country Status (1)

Country Link
CN (1) CN108346067A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109523116A (en) * 2018-10-11 2019-03-26 平安科技(深圳)有限公司 Business risk analysis method, device, computer equipment and storage medium
CN110428295A (en) * 2018-08-01 2019-11-08 北京京东尚科信息技术有限公司 Method of Commodity Recommendation and system
CN110781389A (en) * 2019-10-18 2020-02-11 支付宝(杭州)信息技术有限公司 Method and system for generating recommendations for a user
CN111428112A (en) * 2020-03-26 2020-07-17 上海浩方信息技术有限公司 Method for crawler retrieval and big data intelligent recommendation optimization processing based on open source framework
CN114422835A (en) * 2021-12-29 2022-04-29 上海数即数据科技有限公司 Advertisement directional promotion platform based on big data analysis
CN114491051A (en) * 2022-04-02 2022-05-13 四川省大数据中心 Project approval system for building site

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105095256A (en) * 2014-05-07 2015-11-25 阿里巴巴集团控股有限公司 Information push method and apparatus based on similarity degree between users
CN105678590A (en) * 2016-02-07 2016-06-15 重庆邮电大学 topN recommendation method for social network based on cloud model
CN106204156A (en) * 2016-07-20 2016-12-07 天涯社区网络科技股份有限公司 A kind of advertisement placement method for network forum and device
CN106530001A (en) * 2016-11-03 2017-03-22 广州市万表科技股份有限公司 Information recommending method and apparatus

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105095256A (en) * 2014-05-07 2015-11-25 阿里巴巴集团控股有限公司 Information push method and apparatus based on similarity degree between users
CN105678590A (en) * 2016-02-07 2016-06-15 重庆邮电大学 topN recommendation method for social network based on cloud model
CN106204156A (en) * 2016-07-20 2016-12-07 天涯社区网络科技股份有限公司 A kind of advertisement placement method for network forum and device
CN106530001A (en) * 2016-11-03 2017-03-22 广州市万表科技股份有限公司 Information recommending method and apparatus

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110428295A (en) * 2018-08-01 2019-11-08 北京京东尚科信息技术有限公司 Method of Commodity Recommendation and system
CN109523116A (en) * 2018-10-11 2019-03-26 平安科技(深圳)有限公司 Business risk analysis method, device, computer equipment and storage medium
CN110781389A (en) * 2019-10-18 2020-02-11 支付宝(杭州)信息技术有限公司 Method and system for generating recommendations for a user
CN111428112A (en) * 2020-03-26 2020-07-17 上海浩方信息技术有限公司 Method for crawler retrieval and big data intelligent recommendation optimization processing based on open source framework
CN114422835A (en) * 2021-12-29 2022-04-29 上海数即数据科技有限公司 Advertisement directional promotion platform based on big data analysis
CN114491051A (en) * 2022-04-02 2022-05-13 四川省大数据中心 Project approval system for building site
CN114491051B (en) * 2022-04-02 2022-07-29 四川省大数据中心 Project approval system for building site

Similar Documents

Publication Publication Date Title
CN108346067A (en) Social networks advertisement sending method based on natural language processing
EP3779841B1 (en) Method, apparatus and system for sending information, and computer-readable storage medium
US8873813B2 (en) Application of Z-webs and Z-factors to analytics, search engine, learning, recognition, natural language, and other utilities
US8903198B2 (en) Image ranking based on attribute correlation
US20180158078A1 (en) Computer device and method for predicting market demand of commodities
CN107357793B (en) Information recommendation method and device
CN113168586A (en) Text classification and management
US20220405607A1 (en) Method for obtaining user portrait and related apparatus
US20140201126A1 (en) Methods and Systems for Applications for Z-numbers
CN106447066A (en) Big data feature extraction method and device
CN105574067A (en) Item recommendation device and item recommendation method
CN106445988A (en) Intelligent big data processing method and system
WO2018040069A1 (en) Information recommendation system and method
CN106250513A (en) A kind of event personalization sorting technique based on event modeling and system
CN106682686A (en) User gender prediction method based on mobile phone Internet-surfing behavior
Yuan et al. Sentiment analysis using social multimedia
CN110737811A (en) Application classification method and device and related equipment
CN110705304A (en) Attribute word extraction method
CN112966010A (en) User track information mining method
CN112749330A (en) Information pushing method and device, computer equipment and storage medium
CN113656699B (en) User feature vector determining method, related equipment and medium
CN116823410B (en) Data processing method, object processing method, recommending method and computing device
CN107070702B (en) User account correlation method and device based on cooperative game support vector machine
US20230351473A1 (en) Apparatus and method for providing user's interior style analysis model on basis of sns text
Choi et al. Fake review identification and utility evaluation model using machine learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180731

RJ01 Rejection of invention patent application after publication