CN108897762A - A kind of broadcasting system of accurate user's screening - Google Patents

A kind of broadcasting system of accurate user's screening Download PDF

Info

Publication number
CN108897762A
CN108897762A CN201810501536.XA CN201810501536A CN108897762A CN 108897762 A CN108897762 A CN 108897762A CN 201810501536 A CN201810501536 A CN 201810501536A CN 108897762 A CN108897762 A CN 108897762A
Authority
CN
China
Prior art keywords
user
data
information
accurate
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810501536.XA
Other languages
Chinese (zh)
Inventor
李佳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen Catch Up Network Technology Co Ltd
Original Assignee
Xiamen Catch Up Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen Catch Up Network Technology Co Ltd filed Critical Xiamen Catch Up Network Technology Co Ltd
Priority to CN201810501536.XA priority Critical patent/CN108897762A/en
Publication of CN108897762A publication Critical patent/CN108897762A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention proposes the broadcasting system of accurate user's screening, it is related to technical field of internet application, including user data acquisition module, user data memory module, Users'Data Analysis module and user management module, the user data acquisition module clicks behavioral data for acquiring User Page, is acquired by one section of js script;Users'Data Analysis module described in the data of the user data memory module cluster-based storage page capture is for data cleansing, data merging and data feature structure;The user management module includes management and logistic regression training management based on user's essential characteristic.The present invention establishes accurate broadcasting system, realizes user's screening, chooses accurate user, realizes and precisely propagates, and by precisely propagating, finds the accurate user of enterprise, removes hardware and software platform, the deserved user data of enterprise is supplied to enterprise, enterprise is assisted to establish the database of oneself.

Description

A kind of broadcasting system of accurate user's screening
Technical field
The present invention relates to technical field of internet application, especially a kind of broadcasting system of accurate user's screening.
Background technique
With flourishing for mobile connection net, the keyword that active push mechanism gradually substitutes search engine is precisely sought Pin.Under the mechanism of information flow active push, it is vital for how obtaining high-value user.
Existing this kind of similar technique, such as:Mobile phone user's screening, though the screening of this kind of user data can according to area, The elements such as gender are screened, and but lacking sales data is to rely on, and can not judge these users to the accurate of enterprise demand in advance Degree.
Summary of the invention
The present invention provides a kind of broadcasting system of accurate user's screening, including user data acquisition module, user data are deposited Module, Users'Data Analysis module and user management module are stored up, the user data acquisition module is for acquiring user page millet cake Behavioral data is hit, is acquired by one section of js script;The data institute of the user data memory module cluster-based storage page capture Users'Data Analysis module is stated for data cleansing, data merging and data feature structure;The user management module includes Management and logistic regression training management based on user's essential characteristic.
Preferably, the data cleansing is used to complete the user information of those mistakes in data, first according to canonical table The cleaning to user Id is completed up to formula;Secondly, for there is information few in number storage mistakes in historical data History log information.
Preferably, the data merge for the information of the same user to be incorporated into same file, root first The essential information of user, retrieval information, behavioural information are incorporated into same file according to the user Id of user;Then, according to By the propagation information of user and after propagating, the feedback information of user is incorporated into same file the activity_id of propagation.
Preferably, the management based on user's essential characteristic is according to some essential informations of user to the user of propagation Judged, when user is unsatisfactory for any of which condition, then this user is unsatisfactory for the demand this time propagated, and abandons this use Family.
Preferably, the logistic regression training management data are trained to obtain a LR model according to data with existing, User can be finally being obtained for this interested score value of push in this way, and finally user is being ranked up based on user's score value, Can both this subtask be propagated by then taking out top N user.
A kind of broadcasting system of accurate user's screening provided by the invention, the beneficial effect is that:It establishes and precisely propagates system System realizes user's screening, chooses accurate user, realizes and precisely propagates, by precisely propagating, finds the accurate user of enterprise, go The deserved user data of enterprise is supplied to enterprise, enterprise is assisted to establish the database of oneself by hardware and software platform.
Detailed description of the invention
Fig. 1 is broadcasting system block diagram of the invention.
Specific embodiment
To further illustrate that each embodiment, the present invention are provided with attached drawing.These attached drawings are that the invention discloses one of content Point, mainly to illustrate embodiment, and the associated description of specification can be cooperated to explain the operation principles of embodiment.Cooperation ginseng These contents are examined, those of ordinary skill in the art will be understood that other possible embodiments and advantages of the present invention.In figure Component be not necessarily to scale, and similar component symbol is conventionally used to indicate similar component.
Now in conjunction with the drawings and specific embodiments, the present invention is further described.
As shown in Figure 1, the broadcasting system of accurate user's screening provided in this embodiment, including user data acquisition module, User data memory module, Users'Data Analysis module and user management module,
User data acquisition module for acquire User Page click behavioral data, when the user clicks a button or Link, click event can be collected and be transferred to user data memory module and be stored, and user data acquisition module is It is acquired by one section of js script, principle is:Event monitoring is carried out to each element of the page, concrete principle is as follows:It is wanting The page of monitoring adds js script, and the page of such client browser downloading just includes following script information:<script Type=" text/javascript " src=" http://click.demo.com/s_tracker.js"></script>; (2) the mounting assembly MSMQ on Window server.
User data memory module is by the data of mysql cluster-based storage page capture, and major function is to the page The initial data of acquisition carries out persistent storage, and divides library according to domain name, page data storage principle is that library name corresponds to domain Name, the library click_product is according to the odd even of permanent_id, to be inserted into data.Even number is inserted into click_product0, odd Number insertion click_product1.
Users'Data Analysis module is for data cleansing, data merging and data feature structure, wherein
Data cleansing is mainly to complete the user information of those mistakes in data, as user user Id does not meet setting knot Structure, the information of storage mistake etc..The behavioural information of a large amount of user present in the history log information of user, but these are believed Some in breath are that user user Id information itself is deliberately had modified by illegal means, are not meet biography for these information Broadcast the information of mission requirements because for this category information, even if the user characteristics excavated, which extremely meet, is user, but due to The user Id at family is mistake, can not normally complete the propagation task to user, and therefore, it is necessary to carry out pure and fresh work to data Make.Complete the cleaning to user Id according to regular expressions first.Secondly, in historical data there is for number not The history log information of more information storage mistakes, such as:The user operation time of storage is on January 1st, 1970, causes this letter Breath exists may be due to the bug of system itself, it is also possible to be the modification of user deliberately, also or during journal dump cause Equal many-sided reason be also therefore to need to clean for such data.
It is the redundancy of data information existing for data after being crossed by data cleansing, the same user may be in difference Journal file in exist, the expansion of the excacation to data characteristics inconvenient in this way, therefore, it is necessary to complete data to merge work Make.Data merging refers to that the information by the same user is incorporated into same file.It will be used according to the user Id of user first Essential information, retrieval information, the behavioural information at family are incorporated into same file, then, will according to the activity_id of propagation The feedback information of user is incorporated into same file after the propagation information and propagation of user.
After data are completed to merge, need to carry out structuring processing to data.To the process of user's information structureization processing The fusion for needing to carry out data data, for the query word that user clicked, by taking " tealeaves " as an example.The click information of user In there is " tealeaves " same user that user clicked, that multiple " tealeaves " may be clicked in one day or more days was related Information, it is impossible to be stored as that " user Id, tea_1, tea_2 ... tea n ", this storage is first is that when need user in data When information matches, need to carry out the retrieval and inquisition of long period, second is that huge waste is caused for memory space, third is that being difficult Go out the preference to " tealeaves " this information of user by a series of information excavating.Therefore, it is necessary to primary to " tealeaves " Information be integrated into " tealeaves | first_day | end_day | day_num | query_num " format, first item is Query word, Section 2 are to click the time for the first time, and Section 3 is last time time of occurrence, and Section 4 is that total degree (refers to not Occur how many times on the same day), Section 5 is to click how many times altogether, can directly obtain the user in this way to this The sensitivity of query word, if repeatedly clicked, number is more, and surrounding time can prove the user couple if larger The query word is quite sensitive word.
User management module includes management and logistic regression training management based on user's essential characteristic, wherein
Management based on user's essential characteristic judges the user of propagation according to some essential informations of user, when with When family is unsatisfactory for any of which condition, then this user is unsatisfactory for the demand this time propagated, and abandons this user.Because for Android and the user of IOS operating system need to propagate respectively, so needing after user's screening to each It is counted accordingly from user.The number of users that task may need be the total quantity of Android and IOS user either Android and IOS distinguishes number of users, while can refer in task and need the quantity of new user is how many surely.But due to logical The Key for crossing Map Reduce setting is the user Id of user, because Android has with the user Id of IOS the being connected property of itself, because , when carrying out Reduce hash output, the user that will cause same operating system is likely to be assigned in the same Reduce for this It is exported, is thus be easy to cause when carrying out operating system differentiation to user, while when user demand quantity is smaller, can be made Belong to same operating system at the user filtered out, run counter to the actual ratio of real user, so needing to obtaining As a result upset, can be entirely in this way Android or IOS to avoid the user selected, cause user out of proportion.
After data have certain accumulation, then existing data can be used and be trained to obtain a LR model, in this way Interested score value is pushed for this can finally obtain user, theoretically score value is used greater than 0.5 point or more and is believed that per family User interested, is finally ranked up user based on user's score value, then takes out top N user when for this propagation task Both this subtask can be propagated.
Wherein, user's score value sorts, and can also be carried out by Map Reduce.Specific method is:
Directly exported using Reduce, i.e., because that can be ranked up according to key value when Reduce is exported, therefore can be with It is only exported with a Reduce when setting output, can guarantee to export what result was ordered into this way, but will cause in this way The characteristic of the distributed computing of Map Reduce all disappears, and when data volume is bigger, calculating speed can be very slow, can It can wait for a long time.
Although specifically showing and describing the present invention in conjunction with preferred embodiment, those skilled in the art should be bright It is white, it is not departing from the spirit and scope of the present invention defined by the appended claims, it in the form and details can be right The present invention makes a variety of changes, and is protection scope of the present invention.

Claims (5)

1. a kind of broadcasting system of accurate user's screening, which is characterized in that stored including user data acquisition module, user data Module, Users'Data Analysis module and user management module, the user data acquisition module is for acquiring User Page click Behavioral data is acquired by one section of js script;Described in the data of the user data memory module cluster-based storage page capture Users'Data Analysis module is for data cleansing, data merging and data feature structure;The user management module includes base In the management and logistic regression training management of user's essential characteristic.
2. a kind of broadcasting system of accurate user's screening according to claim 1, which is characterized in that the data cleansing is used In the user information for completing those mistakes in data, the cleaning to user Id is completed according to regular expressions first;Its It is secondary, for there is the history log informations of information few in number storage mistake in historical data.
3. a kind of broadcasting system of accurate user's screening according to claim 1, which is characterized in that the data, which merge, to be used Be incorporated into same file in by the information of the same user, first according to the user Id of user by the essential information of user, Retrieval information, behavioural information are incorporated into same file;Then, the propagation of user is believed according to the activity_id of propagation The feedback information of user is incorporated into same file after breath and propagation.
4. a kind of broadcasting system of accurate user's screening according to claim 1, which is characterized in that described to be based on user's base The management of eigen judges the user of propagation according to some essential informations of user, when user is unsatisfactory for any of them one When kind condition, then this user is unsatisfactory for the demand this time propagated, and abandons this user.
5. a kind of broadcasting system of accurate user's screening according to claim 1, which is characterized in that the logistic regression instruction Practice management data to be trained to obtain a LR model according to data with existing, can finally obtain user for this push in this way Interested score value is finally ranked up user based on user's score value, then take out top N user both can to this subtask into Row is propagated.
CN201810501536.XA 2018-05-23 2018-05-23 A kind of broadcasting system of accurate user's screening Pending CN108897762A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810501536.XA CN108897762A (en) 2018-05-23 2018-05-23 A kind of broadcasting system of accurate user's screening

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810501536.XA CN108897762A (en) 2018-05-23 2018-05-23 A kind of broadcasting system of accurate user's screening

Publications (1)

Publication Number Publication Date
CN108897762A true CN108897762A (en) 2018-11-27

Family

ID=64343285

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810501536.XA Pending CN108897762A (en) 2018-05-23 2018-05-23 A kind of broadcasting system of accurate user's screening

Country Status (1)

Country Link
CN (1) CN108897762A (en)

Similar Documents

Publication Publication Date Title
US11580680B2 (en) Systems and interactive user interfaces for dynamic retrieval, analysis, and triage of data items
US10417063B2 (en) Artificial creation of dominant sequences that are representative of logged events
Rodden et al. Measuring the user experience on a large scale: user-centered metrics for web applications
US8140545B2 (en) Data organization and evaluation using a two-topology configuration
Carasso Exploring splunk
EP2801943A1 (en) A system and method for generating a chronological timesheet
CN108549569B (en) Method and equipment for searching information in application program
US11860717B1 (en) Graphical user interface for presenting crash data
US20050256956A1 (en) Analyzing user-activity data using a heuristic-based approach
JP7254975B2 (en) Key-based logging for processing structured data items with executable logic
WO2014194512A1 (en) Information sensors for sensing web dynamics
JP2011154467A (en) Retrieval result ranking method and system
US9727663B2 (en) Data store query prediction
US8060484B2 (en) Graphical user interface for data management
CN111523921B (en) Funnel analysis method, analysis device, electronic device, and readable storage medium
US8214743B2 (en) Data management techniques
CN110222032A (en) A kind of generalised event model based on software data analysis
CN108897762A (en) A kind of broadcasting system of accurate user&#39;s screening
CN115168312A (en) Transaction log replay-based pressure test method and device
Anh Web Scraping: A Big Data Building Tool And Its Status In The Fintech Sector In Viet Nam
Schwanke Faculty Informatics Bachelor of Science–Business Information Systems
CN115033215A (en) Data flow graph construction method, device, equipment and medium
Englund et al. A web crawler to effectively find web shops built with a specific e-commerce plug-in
AU2014202495A1 (en) A system and method for generating a chronological timesheet
Du Mining Twitter Data for Resource Usage Prediction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination