CN108322355A - User traffic data processing method, processing unit, electronic equipment and storage medium - Google Patents

User traffic data processing method, processing unit, electronic equipment and storage medium Download PDF

Info

Publication number
CN108322355A
CN108322355A CN201710040291.0A CN201710040291A CN108322355A CN 108322355 A CN108322355 A CN 108322355A CN 201710040291 A CN201710040291 A CN 201710040291A CN 108322355 A CN108322355 A CN 108322355A
Authority
CN
China
Prior art keywords
data
path
click
user
url
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710040291.0A
Other languages
Chinese (zh)
Inventor
谢群群
邵荣防
郝晖
李瑞亮
程浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Jingdong Shangke Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN201710040291.0A priority Critical patent/CN108322355A/en
Publication of CN108322355A publication Critical patent/CN108322355A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/02Capturing of monitoring data
    • H04L43/026Capturing of monitoring data using flow identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/06Generation of reports
    • H04L43/062Generation of reports related to network traffic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

A kind of user traffic data processing method is proposed, including:Data cleansing is carried out to user traffic data to generate legitimate traffic data, legitimate traffic data include click data and browsing data;It is generated using click data and clicks path;Browse path is generated using browsing data;And user's path tree is generated according to click path and browse path.The present invention is obtained for raising in terms of data volume, timeliness, accuracy.

Description

User traffic data processing method, processing unit, electronic equipment and storage medium
Technical field
The present invention relates to Internet technical fields, and in particular to user traffic data processing method, processing unit, electronics are set Standby and storage medium.
Background technology
Now with the growth of mass data, existing flow analysis model is in data volume, timeliness, autgmentability and standard True property is all difficult to meet actual operation requirements.
Current web inner stream flow analysis model is largely single machine numerical procedure, has been used in terms of flow analysis simple URL (Uniform Resource Locator) rule carry out the association between different flow.Use single machine calculation processing Mass data, analysis model aspect carry out flowmeter factor analysis using URL rules and partial service rule.
Main problem is existing for this flow analysis model:1, single machine processing routine has been used, single machine performance is limited to, Data volume processing is limited, can not be used in face of mass data processing;2, processing time is long, before mass data, single machine journey Sequence or simple parallel computation frame are difficult to meet the requirement of data age;3, flow analysis model result is inaccurate, letter Single has relied on URL rules and business rule, the result accuracy rate obtained when in face of complicated network environment and user behavior It is low.
Invention content
In view of this, the present invention proposes a kind of flow analysis mould calculated based on distributed computing framework and path tree Type is obtained for raising in terms of data volume, timeliness, accuracy compared with prior art.
According to the first aspect of the invention, a kind of user traffic data processing method is provided, including:To user traffic data Data cleansing is carried out to generate legitimate traffic data, the legitimate traffic data include click data and browsing data;Using point It hits data and generates click path;Browse path is generated using browsing data;And use is generated according to click path and browse path Family path tree.
In one embodiment, data cleansing may include disabled user ID cleanings, the cleaning of illegal request frequency and black name One or more of single IP address cleaning.
In one embodiment, generating click path using click data may include:Duplicate removal is carried out to click data;It will Click data is processed into URL, record occur to click the URL of the previous page, the URL for the page clicked of the page and The URL of the page redirected after click;Click data is ranked up sequentially in time, by the URL for the page clicked and The URL of the page redirected after the URL of the previous page and click connects;And it will be intermediate without hits in a period of time According to two URL directly connected.
In one embodiment, generating browse path using browsing data may include:Extract user's browsing pages URL and Browsing time;User's browsing pages URL is ranked up according to the browsing time;And series connection user's browsing pages URL is to generate User's browse path.
In one embodiment, the method can also include merging to click path and browse path to generate user path Tree.
In one embodiment, generating user's path tree may include:Path and browse path are will click on according to User ID It is polymerize;Sequentially in time, to after polymerization click path and browse path be ranked up, generate click data and browsing User's path data that data are at least partly alternately present;It is for the browsing data for losing click data, browsing pages URL is straight Series connection is connect, and for the click data for losing browsing data, then removes the click data, to generate click data and browsing number According to the user's path data being alternately present completely;And the click data in user's path data is converted into side and will be browsed Data are converted to node, to generate user's path tree.
According to the second aspect of the invention, a kind of user traffic data processing unit is provided, including:Data cleansing module, It is configured as carrying out data cleansing to user traffic data to generate legitimate traffic data, the legitimate traffic data include clicking Data and browsing data;Path generation module is clicked, is configured as generating using click data and clicks path;Browse path generates Module is configured as generating browse path using browsing data;And user's path tree generation module, it is configured as merging and click Path and browse path, to generate user's path tree.
According to the third aspect of the invention we, a kind of electronic equipment is provided, including:At least one processor;And with it is described The memory of at least one processor communication connection;Wherein, the memory is stored with and can be executed by one processor Instruction, described instruction is executed by least one processor, so that at least one processor is able to carry out according to this hair Method described in bright first aspect.
According to the fourth aspect of the invention, a kind of non-transient computer readable storage medium is provided, which is characterized in that described Non-transient computer readable storage medium stores computer instruction, and the computer instruction is for making the computer execute basis Method described in the first aspect of the present invention.
It is an advantage of the current invention that with processing mass data ability, Hadoop distributed computing frameworks can be used It realizes that data calculate, can easily handle the data of T ranks or more, and support Quick Extended.It is also an advantage of the present invention that With good data processing timeliness., by using the calculation of distributed computing framework and optimization, processing time is big for it It is big to improve, but also optimize to click in flow and be associated with calculation with browsing, while optimizing user's coordinates measurement algorithm.This The advantages of invention, also resides in, and uses the analysis model based on path tree, right by analyzing user browsing behavior and clicking behavior User carries out coordinates measurement calculating in the behavior of Website page, it is possible thereby to which exact picture goes out complete road of the user on website Diameter.
Description of the drawings
Below in conjunction with attached drawing, the above and other aspects, features and advantages of example embodiments of the present invention will be become apparent from.
Fig. 1 shows the flow chart of user data processing according to the ... of the embodiment of the present invention.
Fig. 2 shows the flow charts of the method according to the ... of the embodiment of the present invention cleaned to user traffic data.
Fig. 3 shows the flow chart according to the ... of the embodiment of the present invention that the method that behavioral data is analyzed is clicked to user.
Fig. 4 shows the flow chart of the method according to the ... of the embodiment of the present invention analyzed user browsing behavior data.
Fig. 5 shows the flow chart of the method according to the ... of the embodiment of the present invention for path tree computation model.
Fig. 6 shows the diagram for converting user's path data to user's path tree.
Fig. 7 is the flow chart for showing user traffic data processing method according to the ... of the embodiment of the present invention.
Fig. 8 is the block diagram for showing customer flow processing unit according to the ... of the embodiment of the present invention.
Fig. 9 is the block diagram for showing electronic equipment according to the ... of the embodiment of the present invention.
Specific implementation mode
It explains below to the embodiment of the present invention, including the various details of the embodiment of the present invention to help to manage Solution, they should be thought to be only exemplary.It therefore, it will be appreciated by the person skilled in the art that can be to being described herein Embodiment make various modifications and change, without departing from scope and spirit of the present invention.
Fig. 1 shows the flow chart of user data processing according to the ... of the embodiment of the present invention.As shown in Figure 1, from mass data Cluster 111,112,113 etc. (may be collectively referred to as mass data cluster 110) obtains mass data, such as Jingdone district user is in Jingdone district net It stands, the use data on the various products such as APP clients.
It clicks behavioral data analysis 122 subsequently into distributed computing framework 120, including data cleansing 121, user, use Family navigation patterns data analysis 123 and path tree computation model 124, final output user's path tree.Wherein, path tree is one Kind user behavior portrays mode.For example, by analyzing user's click and navigation patterns, user's flow on website is calculated Distribution tree, the branch of tree is, for example, the access path of user, and leaf node is, for example, that user accesses the last page in branch.
Next, can be directed to user's path tree that distributed computing framework 120 exports carries out all kinds of business support analyses 130, such as page quality analysis 131, conversion ratio analysis 132, user behavior analysis etc., but not limited to this.
Entire user data process flow is related to data cleansing, user behavior analysis and final path tree computation model Etc. main flows.Entire calculation process can improve mass data processing ability and data all using distributed computing framework Computational valid time.It can be analyzed by user behavior data in terms of flow analysis and ensure analysis result with path tree computation model Accuracy.
Fig. 2 shows the flow charts of the method 200 according to the ... of the embodiment of the present invention cleaned to user traffic data.Example Such as, data cleansing 200 may include:In disabled user's ID cleanings of step 201, cleaned in the illegal request frequency of step 202, It is cleaned in the blacklist IP address of step 203, and legitimate traffic data is obtained in step 204.More specifically, can utilize Such as following cleaning rule cleans magnanimity data on flows.
1. the user data of removal preceding 10%, this partial data is considered in quality being largely reptile data;
2. removing the data of not User ID;
3. removal can not judge the data in source, wherein source judgement can be according to for example whether comprising " jd ", including then For legitimate origin;
4. single day user data of removal records excessive data, for example, can be by the data of accounting operation note top 5% Directly remove;
5. blacklist IP data are removed, it is, for example, possible to use anti-cheating IP lists are cleaned.
After data cleansing, legitimate traffic data can be obtained, data are analyzed and browsed for subsequent click data Analysis.
Fig. 3 shows the flow according to the ... of the embodiment of the present invention that the method 300 that behavioral data is analyzed is clicked to user Figure.Generally, the input of method 300 can be validated user click traffic, and output can be that user clicks path.
Specifically, method 300 may include step 301, and click data normalizes, such as is gone to user click data Weight is simultaneously ranked up according to the time.
Method 300 can also include step 302, and click data URLization simultaneously records the URL before and after clicking.For example, for point Data are hit, the URL of the previous page, the URL for the current page clicked for occurring to click the page can be recorded and are clicked The URL of the page redirected afterwards.It should be noted that since click data may fail to report because of factors such as network environments, not All click datas, which can be formed as, to be included previous page URL, current page URL and redirects the hits of rear page URL According to.In this case, click data may only include the part in these three URL.
Next, in step 303, rule can be redirected according to front and back URL and generates path.Specifically, it generates and clicks path May include that click URL to user carries out according to time sequence will click on URL and previous URL that position occurs and click URL after redirecting connects, and composition user clicks path.Series connection method can be a upper URL current URL of connection, when The next URL of preceding URL connections.The series connection can be realized for example, by modes such as chained lists.
Next, in step 304, it will click on path and be merged into final click path.As noted previously, as may be by net The influence of the factors such as network environment, click data may be imperfect.In this case it is necessary to handle loss click or click data Incomplete user data.Specifically, customer flow URL can be carried out according to time sequence, if do not had a little between two URL Data are hit, then the two URL are subjected to pressure association.The associated detailed process is forced to may include:By the same user one (such as 30 minutes) continuous unremitting click data carries out according to time sequence, and to two URL of the same user in the section time The data without click, are directly linked among data, and path is clicked to generate user.
Fig. 4 shows the flow chart of the method according to the ... of the embodiment of the present invention analyzed user browsing behavior data 400.Generally, the input of method 400 can be validated user browsing flow, and output can be user's browse path.
Method 400 may include:In step 401, user browse data is normalized, extracts user's browse page Face URL and time;It in step 402, is ranked up according to the browsing time, user browse data is ranked up;And in step 403, user's browse path is generated, is temporally associated with for example, browsing URL to user, establishes user's browse path.
As noted previously, as may be influenced by factors such as network environments, browsing data may be imperfect.In this feelings It, can continuous unremitting browsing data be temporally arranged within a period of time (such as 30 minutes) to the same user under condition Sequence, and current URL and timestamp connecting earlier than current url data according to timestamp size order, from establishing user Browse path.The series connection can be realized for example, by chained list etc..
Fig. 5 shows the flow chart of the method 500 according to the ... of the embodiment of the present invention for path tree computation model.It is overall On, the input of method 500 can be user click path and user's browse path, output can be user's path trees.
Method 500 may include step 501, and data combination is clicked and browsed by User ID, for example, will according to UUID The browsing and click data (for example, generated user clicks path and browse path) of individual consumer polymerize.Then, exist Step 502, data sorting is clicked and browsed inside individual consumer ID, specifically, can be pressed to clicking and browsing data Time-sequencing generates and clicks and browse the user's path data for exchanging and occurring.
It should be noted that it is as described above, due to click data and the imperfect of data is browsed, possibly can not have been generated It is to click and browse the user's path data being alternately present entirely, therefore, it is necessary to carry out the processing of step 503.
In step 503, click can be handled and lose and browse and lost, association is forced to click the data before and after time point.Tool Body, for losing the browsing data clicked (such as not clicked between two browsing data), after its time-sequencing, into Row forces association.Therefore, association here is associated with browsing data pressure and is consistent.In addition, the click for losing browsing Data (such as without browsing data between two click datas), then directly remove the click data.
Thus, it is possible to obtain the path data being alternately present completely with click data and browsing data.Then, in step 504, for path data, data can be will click on and be converted into side, converted browsing data to node, generate user's path tree. Fig. 6 shows the diagram for converting user's path data to user's path tree.
Fig. 7 is the flow chart for showing user traffic data processing method 700 according to the ... of the embodiment of the present invention.Method 700 is wrapped It includes:Data cleansing is carried out to generate legitimate traffic data to user traffic data in step 701, the legitimate traffic data include Click data and browsing data;In step 702, is generated using click data and click path;In step 703, browsing data are utilized Generate browse path;And in step 704, user's path tree is generated according to click path and browse path.
In one embodiment, the data cleansing in step 701 may include disabled user ID cleanings, illegal request frequency One or more of cleaning and the cleaning of blacklist IP address.
In one embodiment, generating click path using click data in step 702 may include:To click data into Row duplicate removal;It will click on data and be processed into URL, the URL of the previous page of the page, the page clicked occur to click for record The URL and URL of the page redirected after clicking;Click data is ranked up sequentially in time, the page that will be clicked URL and the previous page URL and click after the URL of the page that redirects connect;And it will be intermediate in a period of time Two URL of no click data are directly connected.
In one embodiment, generating browse path using browsing data in step 703 may include:Extract user's browsing Page URL and browsing time;User's browsing pages URL is ranked up according to the browsing time;And series connection user's browsing pages URL is to generate user's browse path.
In one embodiment, can also include merging to click path and browse path to generate user road in step 704 Diameter tree.
In one embodiment, generating user's path tree according to click path and browse path in step 704 may include: Path is will click on according to User ID and browse path is polymerize;Sequentially in time, to after polymerization click path and browsing Path is ranked up, and is generated click data and is browsed user's path data that data are at least partly alternately present;For loss point The browsing data for hitting data, are directly connected, and for the click data for losing browsing data, then remove the hits According to generate user's path data that click data is alternately present completely with browsing data;And it will be in user's path data Click data is converted to side and browsing data is converted to node, to generate user's path tree.
Fig. 8 is the block diagram for showing customer flow processing unit 800 according to the ... of the embodiment of the present invention.Device 800 includes:Data Cleaning module 801 is configured as carrying out data cleansing to user traffic data to generate legitimate traffic data, the legitimate traffic Data include click data and browsing data;Path generation module 802 is clicked, is configured as generating using click data and clicks road Diameter;Browse path generation module 803 is configured as generating browse path using browsing data;And user's path tree generates mould Block 804 is configured as generating user's path tree according to click path and browse path.
In one embodiment, data cleansing module 801 can be additionally configured to execute disabled user ID cleanings, illegal request One or more of frequency is cleaned and blacklist IP address is cleaned.
In one embodiment, clicking path generation module 802 can also be additionally configured to:Click data is gone Weight;Will click on data and be processed into URL, record occur the URL for clicking the previous page of the page, the URL for the page clicked, And the URL of the page redirected after clicking;Click data is ranked up sequentially in time, by the page clicked The URL of the page redirected after the URL and click of URL and the previous page connects;And by intermediate nothing in a period of time Two URL of click data are directly connected.
In one embodiment, browse path generation module 803 can be additionally configured to:Extract user's browsing pages URL And the browsing time;User's browsing pages URL is ranked up according to the browsing time;And series connection user's browsing pages URL is to produce Raw user's browse path.
In one embodiment, user's path tree generation module 804, which can be additionally configured to merge, clicks path and browsing Path is to generate user's path tree.
In one embodiment, user's path tree generation module 804 can be additionally configured to:Road is will click on according to User ID Diameter and browse path are polymerize;Sequentially in time, to after polymerization click path and browse path be ranked up, generate point It hits data and browses user's path data that data are at least partly alternately present;For lose click data browsing data, into Row is directly connected, and for the click data for browsing data is lost, then removes the click data, to generate click data and clear User's path data that data of looking at are alternately present completely;And the click data in user's path data is converted into side and is incited somebody to action Browsing data are converted to node, to generate user's path tree.
Fig. 9 is the block diagram for showing electronic equipment 900 according to the ... of the embodiment of the present invention.Electronic equipment 900 includes processor 906 (for example, microprocessor (CPU), digital signal processor (DSP) etc.).Processor 906 can be performed for described herein Single treatment unit either multiple processing units of the different actions of flow.Electronic equipment 900 can also include for from its His entity receives the input unit 902 of signal and the output unit 904 for providing signal to other entities.Input unit 902 and output unit 904 can be arranged to single entities either detach entity.
In addition, electronic equipment 900 may include have it is non-volatile or form of volatile memory at least one readable Storage medium 908, e.g. electrically erasable programmable read-only memory (EEPROM), flash memory, and/or hard disk drive.It is readable Storage medium 910 includes computer program 910, which includes code/computer-readable instruction, by electricity Processor 906 in sub- equipment 900 allows electronic equipment 900 to execute for example above in conjunction with described by Fig. 1 to Fig. 7 when executing Any flow and combinations thereof.
Computer program 910 can be configured with such as computer program module 910A~910E (only as an example, can With more or less) computer program code of framework.Therefore, the code in the computer program of device 900 includes:Module 910A is used for ....Code in computer program further includes:Module 910B, is used for ....Code in computer program also wraps It includes:Module 910C, is used for ..., such.
Although being implemented as computer program module above in conjunction with the code means in Fig. 9 the disclosed embodiments, Make electronic equipment 900 execute when being executed in processor 906 above in conjunction with the described actions of Fig. 1 to 7, however is alternatively implementing In example, at least one in the code means can at least be implemented partly as hardware circuit.
The present invention provides the accurate flow analysis algorithms models for magnanimity data on flows.It calculates journey by optimization Sequence and the timeliness that ensure that data processing using distributed computing framework, and can be clear by user's click behavior and user Behavioural analysis and path tree computation model are look to ensure the accuracy of data.User is accurately calculated on website in the present invention Behavioral data, and carry out accurate description and storage in a manner of path, finally support upper-layer service in this way.
Above scheme is only to show a specific implementation of present inventive concept, and the present invention is not limited to above-mentioned realization sides Case.The part processing in above-mentioned implementation is can be omitted or skips, without departing from the spirit and scope of the present invention.
Method in embodiment can be realized in the form of the program command that can be held and be recorded in by a variety of computer installations In computer readable recording medium storing program for performing.In this case, computer readable recording medium storing program for performing may include individual program command, number According to file, data structure or combinations thereof.Meanwhile the program command recorded in the recording medium specially can be designed or be configured to Technical staff's known applications of the present invention or computer software fields.Computer readable recording medium storing program for performing include such as hard disk, The magnetic mediums such as floppy disk or tape, the optical medium such as compact disk read-only memory (CD-ROM) or digital versatile disc (DVD), Such as floptical disk magnet-optical medium and the hardware device such as storing and executing ROM, RAM of program command, flash memory.This Outside, program command includes the machine language code that compiler is formed and the advanced language that computer can perform by using interpretive program Speech.The hardware device of front can be configured to be operated as at least one software module to execute the operation of the present invention, and inverse It is also the same to operation.
Although the operation of context of methods has shown and described with particular order, the operation of each method can be changed Sequentially so that specific operation can be executed with reverse order or allow to execute spy simultaneously with other operations at least partly Fixed operation.Additionally, this invention is not limited to the above example embodiments, it can be in the premise for not departing from spirit and scope of the present disclosure Under, including one or more other components or operation, or omit one or more other components or operation.
The preferred embodiment of the present invention is had been combined above and shows the present invention, but those skilled in the art will manage Solution, without departing from the spirit and scope of the present invention, can carry out various modifications the present invention, replaces and change.Cause This, the present invention should not be limited by above-described embodiment, and should be limited by appended claims and its equivalent.

Claims (14)

1. a kind of user traffic data processing method, including:
Data cleansing is carried out to generate legitimate traffic data to user traffic data, the legitimate traffic data include click data With browsing data;
It is generated using the click data and clicks path;
Browse path is generated using the browsing data;And
User's path tree is generated according to the click path and the browse path.
2. according to the method described in claim 1, wherein, the data cleansing includes disabled user ID cleanings, illegal request frequency One or more of rate is cleaned and blacklist IP address is cleaned.
3. according to the method described in claim 1, wherein, generating click path using the click data includes:
Duplicate removal is carried out to click data;
It will click on data and be processed into URL, the URL of the previous page of the page, the page clicked occur to click for record The URL and URL of the page redirected after clicking;
Click data is ranked up sequentially in time, by the URL of the URL for the page clicked and the previous page and The URL of the page redirected after click connects;And
Intermediate two URL without click data in a period of time are directly connected.
4. according to the method described in claim 1, wherein, generating browse path using the browsing data includes:
Extract user's browsing pages URL and browsing time;
User's browsing pages URL is ranked up according to the browsing time;And
User's browsing pages URL connect to generate user's browse path.
5. according to the method described in claim 1, wherein, it includes merging the click path and described clear to generate user's path tree Look at path.
6. according to the method described in claim 1, wherein, generating user's path tree includes:
Path is will click on according to User ID and browse path is polymerize;
Sequentially in time, to after polymerization click path and browse path be ranked up, generate click data and browsing data The user's path data being at least partly alternately present;
For the browsing data for losing click data, browsing pages URL is directly connected, and for the point for losing browsing data Data are hit, then remove the click data, to generate user's path data that click data is alternately present completely with browsing data;With And
Click data in user's path data is converted into side and browsing data are converted into node, to generate user path Tree.
7. a kind of user traffic data processing unit, including:
Data cleansing module is configured as carrying out data cleansing to user traffic data to generate legitimate traffic data, the conjunction Method data on flows includes click data and browsing data;
Path generation module is clicked, is configured as generating using the click data and clicks path;
Browse path generation module is configured as generating browse path using the browsing data;And
User's path tree generation module is configured as generating user's path tree according to the click path and the browse path.
8. device according to claim 7, wherein it is clear that the data cleansing module is additionally configured to execution disabled user ID Wash, illegal request frequency cleaning and blacklist IP address cleaning one or more of.
9. device according to claim 7, wherein click path generation module is additionally configured to:
Duplicate removal is carried out to click data;
It will click on data and be processed into URL, the URL of the previous page of the page, the page clicked occur to click for record The URL and URL of the page redirected after clicking;
Click data is ranked up sequentially in time, by the URL of the URL for the page clicked and the previous page and The URL of the page redirected after click connects;And
Intermediate two URL without click data in a period of time are directly connected.
10. device according to claim 7, wherein the browse path generation module is additionally configured to:
Extract user's browsing pages URL and browsing time;
User's browsing pages URL is ranked up according to the browsing time;And
User's browsing pages URL connect to generate user's browse path.
11. device according to claim 7, wherein user's path tree generation module is additionally configured to described in merging Click path and the browse path.
12. device according to claim 7, wherein user's path tree generation module is additionally configured to:
Path is will click on according to User ID and browse path is polymerize;
Sequentially in time, to after polymerization click path and browse path be ranked up, generate click data and browsing data The user's path data being at least partly alternately present;
For the browsing data for losing click data, browsing pages URL is directly connected, and for the point for losing browsing data Data are hit, then remove the click data, to generate user's path data that click data is alternately present completely with browsing data;With And
Click data in user's path data is converted into side and browsing data are converted into node, to generate user path Tree.
13. a kind of electronic equipment, including:
At least one processor;And
The memory being connect at least one processor communication;Wherein,
The memory is stored with the instruction that can be executed by one processor, and described instruction is by least one processor It executes, so that at least one processor is able to carry out method according to any one of claims 1 to 6.
14. a kind of non-transient computer readable storage medium, which is characterized in that the non-transient computer readable storage medium is deposited Computer instruction is stored up, the computer instruction is for making the computer perform claim require 1 to 6 any one of them method.
CN201710040291.0A 2017-01-18 2017-01-18 User traffic data processing method, processing unit, electronic equipment and storage medium Pending CN108322355A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710040291.0A CN108322355A (en) 2017-01-18 2017-01-18 User traffic data processing method, processing unit, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710040291.0A CN108322355A (en) 2017-01-18 2017-01-18 User traffic data processing method, processing unit, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN108322355A true CN108322355A (en) 2018-07-24

Family

ID=62891573

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710040291.0A Pending CN108322355A (en) 2017-01-18 2017-01-18 User traffic data processing method, processing unit, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN108322355A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111698129A (en) * 2020-06-09 2020-09-22 湖南大众传媒职业技术学院 User flow and behavior analysis system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100169802A1 (en) * 2006-11-08 2010-07-01 Seth Goldstein Methods and Systems for Storing, Processing and Managing User Click-Stream Data
CN103577504A (en) * 2012-08-10 2014-02-12 华为技术有限公司 Method and device for putting personalized contents
CN103823883A (en) * 2014-03-06 2014-05-28 焦点科技股份有限公司 Analysis method and system for website user access path
CN104021209A (en) * 2014-06-19 2014-09-03 北京博雅立方科技有限公司 Statistical method for keyword advertising effect and browsing client
CN104462156A (en) * 2013-09-25 2015-03-25 阿里巴巴集团控股有限公司 Feature extraction and individuation recommendation method and system based on user behaviors

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100169802A1 (en) * 2006-11-08 2010-07-01 Seth Goldstein Methods and Systems for Storing, Processing and Managing User Click-Stream Data
CN103577504A (en) * 2012-08-10 2014-02-12 华为技术有限公司 Method and device for putting personalized contents
CN104462156A (en) * 2013-09-25 2015-03-25 阿里巴巴集团控股有限公司 Feature extraction and individuation recommendation method and system based on user behaviors
CN103823883A (en) * 2014-03-06 2014-05-28 焦点科技股份有限公司 Analysis method and system for website user access path
CN104021209A (en) * 2014-06-19 2014-09-03 北京博雅立方科技有限公司 Statistical method for keyword advertising effect and browsing client

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111698129A (en) * 2020-06-09 2020-09-22 湖南大众传媒职业技术学院 User flow and behavior analysis system

Similar Documents

Publication Publication Date Title
WO2020037918A1 (en) Risk control strategy determining method based on predictive model, and related device
US10789311B2 (en) Method and device for selecting data content to be pushed to terminal, and non-transitory computer storage medium
JP6494801B2 (en) Information recommendation method and apparatus, and server
US20190197416A1 (en) Information recommendation method, apparatus, and server based on user data in an online forum
US8751184B2 (en) Transaction based workload modeling for effective performance test strategies
CN105247507B (en) Method, system and storage medium for the influence power score for determining brand
CN108763274B (en) Access request identification method and device, electronic equipment and storage medium
CN106776881A (en) A kind of realm information commending system and method based on microblog
CN111523072A (en) Page access data statistical method and device, electronic equipment and storage medium
CA2396565A1 (en) System and method for estimating prevalence of digital content on the world-wide-web
US20210112101A1 (en) Data set and algorithm validation, bias characterization, and valuation
JP2000011005A (en) Data analyzing method and its device and computer- readable recording medium recorded with data analytical program
CN108304410A (en) A kind of detection method, device and the data analysing method of the abnormal access page
CN111159341B (en) Information recommendation method and device based on user investment and financial management preference
CN105119735B (en) A kind of method and apparatus for determining discharge pattern
CN113221104B (en) Detection method of abnormal behavior of user and training method of user behavior reconstruction model
CN106302350A (en) URL monitoring method, device and equipment
CN109214647B (en) Method for analyzing overflow effect among online access channels based on network access log data
CN110222790A (en) Method for identifying ID, device and server
CN111047448A (en) Analysis method and device for multi-channel data fusion
CN112819528A (en) Crowd pack online method and device and electronic equipment
CN111414410A (en) Data processing method, device, equipment and storage medium
Liu et al. Forecasting influenza epidemics in Hong Kong using Google search queries data: A new integrated approach
Rao et al. An optimal machine learning model based on selective reinforced Markov decision to predict web browsing patterns
CN111160638A (en) Conversion estimation method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180724

RJ01 Rejection of invention patent application after publication