CN108897762A - A kind of broadcasting system of accurate user's screening - Google Patents
A kind of broadcasting system of accurate user's screening Download PDFInfo
- Publication number
- CN108897762A CN108897762A CN201810501536.XA CN201810501536A CN108897762A CN 108897762 A CN108897762 A CN 108897762A CN 201810501536 A CN201810501536 A CN 201810501536A CN 108897762 A CN108897762 A CN 108897762A
- Authority
- CN
- China
- Prior art keywords
- user
- data
- information
- accurate
- module
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention proposes the broadcasting system of accurate user's screening, it is related to technical field of internet application, including user data acquisition module, user data memory module, Users'Data Analysis module and user management module, the user data acquisition module clicks behavioral data for acquiring User Page, is acquired by one section of js script;Users'Data Analysis module described in the data of the user data memory module cluster-based storage page capture is for data cleansing, data merging and data feature structure;The user management module includes management and logistic regression training management based on user's essential characteristic.The present invention establishes accurate broadcasting system, realizes user's screening, chooses accurate user, realizes and precisely propagates, and by precisely propagating, finds the accurate user of enterprise, removes hardware and software platform, the deserved user data of enterprise is supplied to enterprise, enterprise is assisted to establish the database of oneself.
Description
Technical field
The present invention relates to technical field of internet application, especially a kind of broadcasting system of accurate user's screening.
Background technique
With flourishing for mobile connection net, the keyword that active push mechanism gradually substitutes search engine is precisely sought
Pin.Under the mechanism of information flow active push, it is vital for how obtaining high-value user.
Existing this kind of similar technique, such as:Mobile phone user's screening, though the screening of this kind of user data can according to area,
The elements such as gender are screened, and but lacking sales data is to rely on, and can not judge these users to the accurate of enterprise demand in advance
Degree.
Summary of the invention
The present invention provides a kind of broadcasting system of accurate user's screening, including user data acquisition module, user data are deposited
Module, Users'Data Analysis module and user management module are stored up, the user data acquisition module is for acquiring user page millet cake
Behavioral data is hit, is acquired by one section of js script;The data institute of the user data memory module cluster-based storage page capture
Users'Data Analysis module is stated for data cleansing, data merging and data feature structure;The user management module includes
Management and logistic regression training management based on user's essential characteristic.
Preferably, the data cleansing is used to complete the user information of those mistakes in data, first according to canonical table
The cleaning to user Id is completed up to formula;Secondly, for there is information few in number storage mistakes in historical data
History log information.
Preferably, the data merge for the information of the same user to be incorporated into same file, root first
The essential information of user, retrieval information, behavioural information are incorporated into same file according to the user Id of user;Then, according to
By the propagation information of user and after propagating, the feedback information of user is incorporated into same file the activity_id of propagation.
Preferably, the management based on user's essential characteristic is according to some essential informations of user to the user of propagation
Judged, when user is unsatisfactory for any of which condition, then this user is unsatisfactory for the demand this time propagated, and abandons this use
Family.
Preferably, the logistic regression training management data are trained to obtain a LR model according to data with existing,
User can be finally being obtained for this interested score value of push in this way, and finally user is being ranked up based on user's score value,
Can both this subtask be propagated by then taking out top N user.
A kind of broadcasting system of accurate user's screening provided by the invention, the beneficial effect is that:It establishes and precisely propagates system
System realizes user's screening, chooses accurate user, realizes and precisely propagates, by precisely propagating, finds the accurate user of enterprise, go
The deserved user data of enterprise is supplied to enterprise, enterprise is assisted to establish the database of oneself by hardware and software platform.
Detailed description of the invention
Fig. 1 is broadcasting system block diagram of the invention.
Specific embodiment
To further illustrate that each embodiment, the present invention are provided with attached drawing.These attached drawings are that the invention discloses one of content
Point, mainly to illustrate embodiment, and the associated description of specification can be cooperated to explain the operation principles of embodiment.Cooperation ginseng
These contents are examined, those of ordinary skill in the art will be understood that other possible embodiments and advantages of the present invention.In figure
Component be not necessarily to scale, and similar component symbol is conventionally used to indicate similar component.
Now in conjunction with the drawings and specific embodiments, the present invention is further described.
As shown in Figure 1, the broadcasting system of accurate user's screening provided in this embodiment, including user data acquisition module,
User data memory module, Users'Data Analysis module and user management module,
User data acquisition module for acquire User Page click behavioral data, when the user clicks a button or
Link, click event can be collected and be transferred to user data memory module and be stored, and user data acquisition module is
It is acquired by one section of js script, principle is:Event monitoring is carried out to each element of the page, concrete principle is as follows:It is wanting
The page of monitoring adds js script, and the page of such client browser downloading just includes following script information:<script
Type=" text/javascript " src=" http://click.demo.com/s_tracker.js"></script>;
(2) the mounting assembly MSMQ on Window server.
User data memory module is by the data of mysql cluster-based storage page capture, and major function is to the page
The initial data of acquisition carries out persistent storage, and divides library according to domain name, page data storage principle is that library name corresponds to domain
Name, the library click_product is according to the odd even of permanent_id, to be inserted into data.Even number is inserted into click_product0, odd
Number insertion click_product1.
Users'Data Analysis module is for data cleansing, data merging and data feature structure, wherein
Data cleansing is mainly to complete the user information of those mistakes in data, as user user Id does not meet setting knot
Structure, the information of storage mistake etc..The behavioural information of a large amount of user present in the history log information of user, but these are believed
Some in breath are that user user Id information itself is deliberately had modified by illegal means, are not meet biography for these information
Broadcast the information of mission requirements because for this category information, even if the user characteristics excavated, which extremely meet, is user, but due to
The user Id at family is mistake, can not normally complete the propagation task to user, and therefore, it is necessary to carry out pure and fresh work to data
Make.Complete the cleaning to user Id according to regular expressions first.Secondly, in historical data there is for number not
The history log information of more information storage mistakes, such as:The user operation time of storage is on January 1st, 1970, causes this letter
Breath exists may be due to the bug of system itself, it is also possible to be the modification of user deliberately, also or during journal dump cause
Equal many-sided reason be also therefore to need to clean for such data.
It is the redundancy of data information existing for data after being crossed by data cleansing, the same user may be in difference
Journal file in exist, the expansion of the excacation to data characteristics inconvenient in this way, therefore, it is necessary to complete data to merge work
Make.Data merging refers to that the information by the same user is incorporated into same file.It will be used according to the user Id of user first
Essential information, retrieval information, the behavioural information at family are incorporated into same file, then, will according to the activity_id of propagation
The feedback information of user is incorporated into same file after the propagation information and propagation of user.
After data are completed to merge, need to carry out structuring processing to data.To the process of user's information structureization processing
The fusion for needing to carry out data data, for the query word that user clicked, by taking " tealeaves " as an example.The click information of user
In there is " tealeaves " same user that user clicked, that multiple " tealeaves " may be clicked in one day or more days was related
Information, it is impossible to be stored as that " user Id, tea_1, tea_2 ... tea n ", this storage is first is that when need user in data
When information matches, need to carry out the retrieval and inquisition of long period, second is that huge waste is caused for memory space, third is that being difficult
Go out the preference to " tealeaves " this information of user by a series of information excavating.Therefore, it is necessary to primary to " tealeaves "
Information be integrated into " tealeaves | first_day | end_day | day_num | query_num " format, first item is
Query word, Section 2 are to click the time for the first time, and Section 3 is last time time of occurrence, and Section 4 is that total degree (refers to not
Occur how many times on the same day), Section 5 is to click how many times altogether, can directly obtain the user in this way to this
The sensitivity of query word, if repeatedly clicked, number is more, and surrounding time can prove the user couple if larger
The query word is quite sensitive word.
User management module includes management and logistic regression training management based on user's essential characteristic, wherein
Management based on user's essential characteristic judges the user of propagation according to some essential informations of user, when with
When family is unsatisfactory for any of which condition, then this user is unsatisfactory for the demand this time propagated, and abandons this user.Because for
Android and the user of IOS operating system need to propagate respectively, so needing after user's screening to each
It is counted accordingly from user.The number of users that task may need be the total quantity of Android and IOS user either
Android and IOS distinguishes number of users, while can refer in task and need the quantity of new user is how many surely.But due to logical
The Key for crossing Map Reduce setting is the user Id of user, because Android has with the user Id of IOS the being connected property of itself, because
, when carrying out Reduce hash output, the user that will cause same operating system is likely to be assigned in the same Reduce for this
It is exported, is thus be easy to cause when carrying out operating system differentiation to user, while when user demand quantity is smaller, can be made
Belong to same operating system at the user filtered out, run counter to the actual ratio of real user, so needing to obtaining
As a result upset, can be entirely in this way Android or IOS to avoid the user selected, cause user out of proportion.
After data have certain accumulation, then existing data can be used and be trained to obtain a LR model, in this way
Interested score value is pushed for this can finally obtain user, theoretically score value is used greater than 0.5 point or more and is believed that per family
User interested, is finally ranked up user based on user's score value, then takes out top N user when for this propagation task
Both this subtask can be propagated.
Wherein, user's score value sorts, and can also be carried out by Map Reduce.Specific method is:
Directly exported using Reduce, i.e., because that can be ranked up according to key value when Reduce is exported, therefore can be with
It is only exported with a Reduce when setting output, can guarantee to export what result was ordered into this way, but will cause in this way
The characteristic of the distributed computing of Map Reduce all disappears, and when data volume is bigger, calculating speed can be very slow, can
It can wait for a long time.
Although specifically showing and describing the present invention in conjunction with preferred embodiment, those skilled in the art should be bright
It is white, it is not departing from the spirit and scope of the present invention defined by the appended claims, it in the form and details can be right
The present invention makes a variety of changes, and is protection scope of the present invention.
Claims (5)
1. a kind of broadcasting system of accurate user's screening, which is characterized in that stored including user data acquisition module, user data
Module, Users'Data Analysis module and user management module, the user data acquisition module is for acquiring User Page click
Behavioral data is acquired by one section of js script;Described in the data of the user data memory module cluster-based storage page capture
Users'Data Analysis module is for data cleansing, data merging and data feature structure;The user management module includes base
In the management and logistic regression training management of user's essential characteristic.
2. a kind of broadcasting system of accurate user's screening according to claim 1, which is characterized in that the data cleansing is used
In the user information for completing those mistakes in data, the cleaning to user Id is completed according to regular expressions first;Its
It is secondary, for there is the history log informations of information few in number storage mistake in historical data.
3. a kind of broadcasting system of accurate user's screening according to claim 1, which is characterized in that the data, which merge, to be used
Be incorporated into same file in by the information of the same user, first according to the user Id of user by the essential information of user,
Retrieval information, behavioural information are incorporated into same file;Then, the propagation of user is believed according to the activity_id of propagation
The feedback information of user is incorporated into same file after breath and propagation.
4. a kind of broadcasting system of accurate user's screening according to claim 1, which is characterized in that described to be based on user's base
The management of eigen judges the user of propagation according to some essential informations of user, when user is unsatisfactory for any of them one
When kind condition, then this user is unsatisfactory for the demand this time propagated, and abandons this user.
5. a kind of broadcasting system of accurate user's screening according to claim 1, which is characterized in that the logistic regression instruction
Practice management data to be trained to obtain a LR model according to data with existing, can finally obtain user for this push in this way
Interested score value is finally ranked up user based on user's score value, then take out top N user both can to this subtask into
Row is propagated.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810501536.XA CN108897762A (en) | 2018-05-23 | 2018-05-23 | A kind of broadcasting system of accurate user's screening |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810501536.XA CN108897762A (en) | 2018-05-23 | 2018-05-23 | A kind of broadcasting system of accurate user's screening |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108897762A true CN108897762A (en) | 2018-11-27 |
Family
ID=64343285
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810501536.XA Pending CN108897762A (en) | 2018-05-23 | 2018-05-23 | A kind of broadcasting system of accurate user's screening |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108897762A (en) |
-
2018
- 2018-05-23 CN CN201810501536.XA patent/CN108897762A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11580680B2 (en) | Systems and interactive user interfaces for dynamic retrieval, analysis, and triage of data items | |
US10417063B2 (en) | Artificial creation of dominant sequences that are representative of logged events | |
Rodden et al. | Measuring the user experience on a large scale: user-centered metrics for web applications | |
US8140545B2 (en) | Data organization and evaluation using a two-topology configuration | |
Carasso | Exploring splunk | |
EP2801943A1 (en) | A system and method for generating a chronological timesheet | |
CN108549569B (en) | Method and equipment for searching information in application program | |
US11860717B1 (en) | Graphical user interface for presenting crash data | |
US20050256956A1 (en) | Analyzing user-activity data using a heuristic-based approach | |
JP7254975B2 (en) | Key-based logging for processing structured data items with executable logic | |
WO2014194512A1 (en) | Information sensors for sensing web dynamics | |
JP2011154467A (en) | Retrieval result ranking method and system | |
US9727663B2 (en) | Data store query prediction | |
US8060484B2 (en) | Graphical user interface for data management | |
CN111523921B (en) | Funnel analysis method, analysis device, electronic device, and readable storage medium | |
US8214743B2 (en) | Data management techniques | |
CN110222032A (en) | A kind of generalised event model based on software data analysis | |
CN108897762A (en) | A kind of broadcasting system of accurate user's screening | |
CN115168312A (en) | Transaction log replay-based pressure test method and device | |
Anh | Web Scraping: A Big Data Building Tool And Its Status In The Fintech Sector In Viet Nam | |
Schwanke | Faculty Informatics Bachelor of Science–Business Information Systems | |
CN115033215A (en) | Data flow graph construction method, device, equipment and medium | |
Englund et al. | A web crawler to effectively find web shops built with a specific e-commerce plug-in | |
AU2014202495A1 (en) | A system and method for generating a chronological timesheet | |
Du | Mining Twitter Data for Resource Usage Prediction |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |