CN108322355A - User traffic data processing method, processing unit, electronic equipment and storage medium - Google Patents
User traffic data processing method, processing unit, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN108322355A CN108322355A CN201710040291.0A CN201710040291A CN108322355A CN 108322355 A CN108322355 A CN 108322355A CN 201710040291 A CN201710040291 A CN 201710040291A CN 108322355 A CN108322355 A CN 108322355A
- Authority
- CN
- China
- Prior art keywords
- data
- path
- click
- user
- url
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/02—Capturing of monitoring data
- H04L43/026—Capturing of monitoring data using flow identification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/06—Generation of reports
- H04L43/062—Generation of reports related to network traffic
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Quality & Reliability (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Transfer Between Computers (AREA)
Abstract
A kind of user traffic data processing method is proposed, including:Data cleansing is carried out to user traffic data to generate legitimate traffic data, legitimate traffic data include click data and browsing data;It is generated using click data and clicks path;Browse path is generated using browsing data;And user's path tree is generated according to click path and browse path.The present invention is obtained for raising in terms of data volume, timeliness, accuracy.
Description
Technical field
The present invention relates to Internet technical fields, and in particular to user traffic data processing method, processing unit, electronics are set
Standby and storage medium.
Background technology
Now with the growth of mass data, existing flow analysis model is in data volume, timeliness, autgmentability and standard
True property is all difficult to meet actual operation requirements.
Current web inner stream flow analysis model is largely single machine numerical procedure, has been used in terms of flow analysis simple
URL (Uniform Resource Locator) rule carry out the association between different flow.Use single machine calculation processing
Mass data, analysis model aspect carry out flowmeter factor analysis using URL rules and partial service rule.
Main problem is existing for this flow analysis model:1, single machine processing routine has been used, single machine performance is limited to,
Data volume processing is limited, can not be used in face of mass data processing;2, processing time is long, before mass data, single machine journey
Sequence or simple parallel computation frame are difficult to meet the requirement of data age;3, flow analysis model result is inaccurate, letter
Single has relied on URL rules and business rule, the result accuracy rate obtained when in face of complicated network environment and user behavior
It is low.
Invention content
In view of this, the present invention proposes a kind of flow analysis mould calculated based on distributed computing framework and path tree
Type is obtained for raising in terms of data volume, timeliness, accuracy compared with prior art.
According to the first aspect of the invention, a kind of user traffic data processing method is provided, including:To user traffic data
Data cleansing is carried out to generate legitimate traffic data, the legitimate traffic data include click data and browsing data;Using point
It hits data and generates click path;Browse path is generated using browsing data;And use is generated according to click path and browse path
Family path tree.
In one embodiment, data cleansing may include disabled user ID cleanings, the cleaning of illegal request frequency and black name
One or more of single IP address cleaning.
In one embodiment, generating click path using click data may include:Duplicate removal is carried out to click data;It will
Click data is processed into URL, record occur to click the URL of the previous page, the URL for the page clicked of the page and
The URL of the page redirected after click;Click data is ranked up sequentially in time, by the URL for the page clicked and
The URL of the page redirected after the URL of the previous page and click connects;And it will be intermediate without hits in a period of time
According to two URL directly connected.
In one embodiment, generating browse path using browsing data may include:Extract user's browsing pages URL and
Browsing time;User's browsing pages URL is ranked up according to the browsing time;And series connection user's browsing pages URL is to generate
User's browse path.
In one embodiment, the method can also include merging to click path and browse path to generate user path
Tree.
In one embodiment, generating user's path tree may include:Path and browse path are will click on according to User ID
It is polymerize;Sequentially in time, to after polymerization click path and browse path be ranked up, generate click data and browsing
User's path data that data are at least partly alternately present;It is for the browsing data for losing click data, browsing pages URL is straight
Series connection is connect, and for the click data for losing browsing data, then removes the click data, to generate click data and browsing number
According to the user's path data being alternately present completely;And the click data in user's path data is converted into side and will be browsed
Data are converted to node, to generate user's path tree.
According to the second aspect of the invention, a kind of user traffic data processing unit is provided, including:Data cleansing module,
It is configured as carrying out data cleansing to user traffic data to generate legitimate traffic data, the legitimate traffic data include clicking
Data and browsing data;Path generation module is clicked, is configured as generating using click data and clicks path;Browse path generates
Module is configured as generating browse path using browsing data;And user's path tree generation module, it is configured as merging and click
Path and browse path, to generate user's path tree.
According to the third aspect of the invention we, a kind of electronic equipment is provided, including:At least one processor;And with it is described
The memory of at least one processor communication connection;Wherein, the memory is stored with and can be executed by one processor
Instruction, described instruction is executed by least one processor, so that at least one processor is able to carry out according to this hair
Method described in bright first aspect.
According to the fourth aspect of the invention, a kind of non-transient computer readable storage medium is provided, which is characterized in that described
Non-transient computer readable storage medium stores computer instruction, and the computer instruction is for making the computer execute basis
Method described in the first aspect of the present invention.
It is an advantage of the current invention that with processing mass data ability, Hadoop distributed computing frameworks can be used
It realizes that data calculate, can easily handle the data of T ranks or more, and support Quick Extended.It is also an advantage of the present invention that
With good data processing timeliness., by using the calculation of distributed computing framework and optimization, processing time is big for it
It is big to improve, but also optimize to click in flow and be associated with calculation with browsing, while optimizing user's coordinates measurement algorithm.This
The advantages of invention, also resides in, and uses the analysis model based on path tree, right by analyzing user browsing behavior and clicking behavior
User carries out coordinates measurement calculating in the behavior of Website page, it is possible thereby to which exact picture goes out complete road of the user on website
Diameter.
Description of the drawings
Below in conjunction with attached drawing, the above and other aspects, features and advantages of example embodiments of the present invention will be become apparent from.
Fig. 1 shows the flow chart of user data processing according to the ... of the embodiment of the present invention.
Fig. 2 shows the flow charts of the method according to the ... of the embodiment of the present invention cleaned to user traffic data.
Fig. 3 shows the flow chart according to the ... of the embodiment of the present invention that the method that behavioral data is analyzed is clicked to user.
Fig. 4 shows the flow chart of the method according to the ... of the embodiment of the present invention analyzed user browsing behavior data.
Fig. 5 shows the flow chart of the method according to the ... of the embodiment of the present invention for path tree computation model.
Fig. 6 shows the diagram for converting user's path data to user's path tree.
Fig. 7 is the flow chart for showing user traffic data processing method according to the ... of the embodiment of the present invention.
Fig. 8 is the block diagram for showing customer flow processing unit according to the ... of the embodiment of the present invention.
Fig. 9 is the block diagram for showing electronic equipment according to the ... of the embodiment of the present invention.
Specific implementation mode
It explains below to the embodiment of the present invention, including the various details of the embodiment of the present invention to help to manage
Solution, they should be thought to be only exemplary.It therefore, it will be appreciated by the person skilled in the art that can be to being described herein
Embodiment make various modifications and change, without departing from scope and spirit of the present invention.
Fig. 1 shows the flow chart of user data processing according to the ... of the embodiment of the present invention.As shown in Figure 1, from mass data
Cluster 111,112,113 etc. (may be collectively referred to as mass data cluster 110) obtains mass data, such as Jingdone district user is in Jingdone district net
It stands, the use data on the various products such as APP clients.
It clicks behavioral data analysis 122 subsequently into distributed computing framework 120, including data cleansing 121, user, use
Family navigation patterns data analysis 123 and path tree computation model 124, final output user's path tree.Wherein, path tree is one
Kind user behavior portrays mode.For example, by analyzing user's click and navigation patterns, user's flow on website is calculated
Distribution tree, the branch of tree is, for example, the access path of user, and leaf node is, for example, that user accesses the last page in branch.
Next, can be directed to user's path tree that distributed computing framework 120 exports carries out all kinds of business support analyses
130, such as page quality analysis 131, conversion ratio analysis 132, user behavior analysis etc., but not limited to this.
Entire user data process flow is related to data cleansing, user behavior analysis and final path tree computation model
Etc. main flows.Entire calculation process can improve mass data processing ability and data all using distributed computing framework
Computational valid time.It can be analyzed by user behavior data in terms of flow analysis and ensure analysis result with path tree computation model
Accuracy.
Fig. 2 shows the flow charts of the method 200 according to the ... of the embodiment of the present invention cleaned to user traffic data.Example
Such as, data cleansing 200 may include:In disabled user's ID cleanings of step 201, cleaned in the illegal request frequency of step 202,
It is cleaned in the blacklist IP address of step 203, and legitimate traffic data is obtained in step 204.More specifically, can utilize
Such as following cleaning rule cleans magnanimity data on flows.
1. the user data of removal preceding 10%, this partial data is considered in quality being largely reptile data;
2. removing the data of not User ID;
3. removal can not judge the data in source, wherein source judgement can be according to for example whether comprising " jd ", including then
For legitimate origin;
4. single day user data of removal records excessive data, for example, can be by the data of accounting operation note top 5%
Directly remove;
5. blacklist IP data are removed, it is, for example, possible to use anti-cheating IP lists are cleaned.
After data cleansing, legitimate traffic data can be obtained, data are analyzed and browsed for subsequent click data
Analysis.
Fig. 3 shows the flow according to the ... of the embodiment of the present invention that the method 300 that behavioral data is analyzed is clicked to user
Figure.Generally, the input of method 300 can be validated user click traffic, and output can be that user clicks path.
Specifically, method 300 may include step 301, and click data normalizes, such as is gone to user click data
Weight is simultaneously ranked up according to the time.
Method 300 can also include step 302, and click data URLization simultaneously records the URL before and after clicking.For example, for point
Data are hit, the URL of the previous page, the URL for the current page clicked for occurring to click the page can be recorded and are clicked
The URL of the page redirected afterwards.It should be noted that since click data may fail to report because of factors such as network environments, not
All click datas, which can be formed as, to be included previous page URL, current page URL and redirects the hits of rear page URL
According to.In this case, click data may only include the part in these three URL.
Next, in step 303, rule can be redirected according to front and back URL and generates path.Specifically, it generates and clicks path
May include that click URL to user carries out according to time sequence will click on URL and previous URL that position occurs and click
URL after redirecting connects, and composition user clicks path.Series connection method can be a upper URL current URL of connection, when
The next URL of preceding URL connections.The series connection can be realized for example, by modes such as chained lists.
Next, in step 304, it will click on path and be merged into final click path.As noted previously, as may be by net
The influence of the factors such as network environment, click data may be imperfect.In this case it is necessary to handle loss click or click data
Incomplete user data.Specifically, customer flow URL can be carried out according to time sequence, if do not had a little between two URL
Data are hit, then the two URL are subjected to pressure association.The associated detailed process is forced to may include:By the same user one
(such as 30 minutes) continuous unremitting click data carries out according to time sequence, and to two URL of the same user in the section time
The data without click, are directly linked among data, and path is clicked to generate user.
Fig. 4 shows the flow chart of the method according to the ... of the embodiment of the present invention analyzed user browsing behavior data
400.Generally, the input of method 400 can be validated user browsing flow, and output can be user's browse path.
Method 400 may include:In step 401, user browse data is normalized, extracts user's browse page
Face URL and time;It in step 402, is ranked up according to the browsing time, user browse data is ranked up;And in step
403, user's browse path is generated, is temporally associated with for example, browsing URL to user, establishes user's browse path.
As noted previously, as may be influenced by factors such as network environments, browsing data may be imperfect.In this feelings
It, can continuous unremitting browsing data be temporally arranged within a period of time (such as 30 minutes) to the same user under condition
Sequence, and current URL and timestamp connecting earlier than current url data according to timestamp size order, from establishing user
Browse path.The series connection can be realized for example, by chained list etc..
Fig. 5 shows the flow chart of the method 500 according to the ... of the embodiment of the present invention for path tree computation model.It is overall
On, the input of method 500 can be user click path and user's browse path, output can be user's path trees.
Method 500 may include step 501, and data combination is clicked and browsed by User ID, for example, will according to UUID
The browsing and click data (for example, generated user clicks path and browse path) of individual consumer polymerize.Then, exist
Step 502, data sorting is clicked and browsed inside individual consumer ID, specifically, can be pressed to clicking and browsing data
Time-sequencing generates and clicks and browse the user's path data for exchanging and occurring.
It should be noted that it is as described above, due to click data and the imperfect of data is browsed, possibly can not have been generated
It is to click and browse the user's path data being alternately present entirely, therefore, it is necessary to carry out the processing of step 503.
In step 503, click can be handled and lose and browse and lost, association is forced to click the data before and after time point.Tool
Body, for losing the browsing data clicked (such as not clicked between two browsing data), after its time-sequencing, into
Row forces association.Therefore, association here is associated with browsing data pressure and is consistent.In addition, the click for losing browsing
Data (such as without browsing data between two click datas), then directly remove the click data.
Thus, it is possible to obtain the path data being alternately present completely with click data and browsing data.Then, in step
504, for path data, data can be will click on and be converted into side, converted browsing data to node, generate user's path tree.
Fig. 6 shows the diagram for converting user's path data to user's path tree.
Fig. 7 is the flow chart for showing user traffic data processing method 700 according to the ... of the embodiment of the present invention.Method 700 is wrapped
It includes:Data cleansing is carried out to generate legitimate traffic data to user traffic data in step 701, the legitimate traffic data include
Click data and browsing data;In step 702, is generated using click data and click path;In step 703, browsing data are utilized
Generate browse path;And in step 704, user's path tree is generated according to click path and browse path.
In one embodiment, the data cleansing in step 701 may include disabled user ID cleanings, illegal request frequency
One or more of cleaning and the cleaning of blacklist IP address.
In one embodiment, generating click path using click data in step 702 may include:To click data into
Row duplicate removal;It will click on data and be processed into URL, the URL of the previous page of the page, the page clicked occur to click for record
The URL and URL of the page redirected after clicking;Click data is ranked up sequentially in time, the page that will be clicked
URL and the previous page URL and click after the URL of the page that redirects connect;And it will be intermediate in a period of time
Two URL of no click data are directly connected.
In one embodiment, generating browse path using browsing data in step 703 may include:Extract user's browsing
Page URL and browsing time;User's browsing pages URL is ranked up according to the browsing time;And series connection user's browsing pages
URL is to generate user's browse path.
In one embodiment, can also include merging to click path and browse path to generate user road in step 704
Diameter tree.
In one embodiment, generating user's path tree according to click path and browse path in step 704 may include:
Path is will click on according to User ID and browse path is polymerize;Sequentially in time, to after polymerization click path and browsing
Path is ranked up, and is generated click data and is browsed user's path data that data are at least partly alternately present;For loss point
The browsing data for hitting data, are directly connected, and for the click data for losing browsing data, then remove the hits
According to generate user's path data that click data is alternately present completely with browsing data;And it will be in user's path data
Click data is converted to side and browsing data is converted to node, to generate user's path tree.
Fig. 8 is the block diagram for showing customer flow processing unit 800 according to the ... of the embodiment of the present invention.Device 800 includes:Data
Cleaning module 801 is configured as carrying out data cleansing to user traffic data to generate legitimate traffic data, the legitimate traffic
Data include click data and browsing data;Path generation module 802 is clicked, is configured as generating using click data and clicks road
Diameter;Browse path generation module 803 is configured as generating browse path using browsing data;And user's path tree generates mould
Block 804 is configured as generating user's path tree according to click path and browse path.
In one embodiment, data cleansing module 801 can be additionally configured to execute disabled user ID cleanings, illegal request
One or more of frequency is cleaned and blacklist IP address is cleaned.
In one embodiment, clicking path generation module 802 can also be additionally configured to:Click data is gone
Weight;Will click on data and be processed into URL, record occur the URL for clicking the previous page of the page, the URL for the page clicked,
And the URL of the page redirected after clicking;Click data is ranked up sequentially in time, by the page clicked
The URL of the page redirected after the URL and click of URL and the previous page connects;And by intermediate nothing in a period of time
Two URL of click data are directly connected.
In one embodiment, browse path generation module 803 can be additionally configured to:Extract user's browsing pages URL
And the browsing time;User's browsing pages URL is ranked up according to the browsing time;And series connection user's browsing pages URL is to produce
Raw user's browse path.
In one embodiment, user's path tree generation module 804, which can be additionally configured to merge, clicks path and browsing
Path is to generate user's path tree.
In one embodiment, user's path tree generation module 804 can be additionally configured to:Road is will click on according to User ID
Diameter and browse path are polymerize;Sequentially in time, to after polymerization click path and browse path be ranked up, generate point
It hits data and browses user's path data that data are at least partly alternately present;For lose click data browsing data, into
Row is directly connected, and for the click data for browsing data is lost, then removes the click data, to generate click data and clear
User's path data that data of looking at are alternately present completely;And the click data in user's path data is converted into side and is incited somebody to action
Browsing data are converted to node, to generate user's path tree.
Fig. 9 is the block diagram for showing electronic equipment 900 according to the ... of the embodiment of the present invention.Electronic equipment 900 includes processor 906
(for example, microprocessor (CPU), digital signal processor (DSP) etc.).Processor 906 can be performed for described herein
Single treatment unit either multiple processing units of the different actions of flow.Electronic equipment 900 can also include for from its
His entity receives the input unit 902 of signal and the output unit 904 for providing signal to other entities.Input unit
902 and output unit 904 can be arranged to single entities either detach entity.
In addition, electronic equipment 900 may include have it is non-volatile or form of volatile memory at least one readable
Storage medium 908, e.g. electrically erasable programmable read-only memory (EEPROM), flash memory, and/or hard disk drive.It is readable
Storage medium 910 includes computer program 910, which includes code/computer-readable instruction, by electricity
Processor 906 in sub- equipment 900 allows electronic equipment 900 to execute for example above in conjunction with described by Fig. 1 to Fig. 7 when executing
Any flow and combinations thereof.
Computer program 910 can be configured with such as computer program module 910A~910E (only as an example, can
With more or less) computer program code of framework.Therefore, the code in the computer program of device 900 includes:Module
910A is used for ....Code in computer program further includes:Module 910B, is used for ....Code in computer program also wraps
It includes:Module 910C, is used for ..., such.
Although being implemented as computer program module above in conjunction with the code means in Fig. 9 the disclosed embodiments,
Make electronic equipment 900 execute when being executed in processor 906 above in conjunction with the described actions of Fig. 1 to 7, however is alternatively implementing
In example, at least one in the code means can at least be implemented partly as hardware circuit.
The present invention provides the accurate flow analysis algorithms models for magnanimity data on flows.It calculates journey by optimization
Sequence and the timeliness that ensure that data processing using distributed computing framework, and can be clear by user's click behavior and user
Behavioural analysis and path tree computation model are look to ensure the accuracy of data.User is accurately calculated on website in the present invention
Behavioral data, and carry out accurate description and storage in a manner of path, finally support upper-layer service in this way.
Above scheme is only to show a specific implementation of present inventive concept, and the present invention is not limited to above-mentioned realization sides
Case.The part processing in above-mentioned implementation is can be omitted or skips, without departing from the spirit and scope of the present invention.
Method in embodiment can be realized in the form of the program command that can be held and be recorded in by a variety of computer installations
In computer readable recording medium storing program for performing.In this case, computer readable recording medium storing program for performing may include individual program command, number
According to file, data structure or combinations thereof.Meanwhile the program command recorded in the recording medium specially can be designed or be configured to
Technical staff's known applications of the present invention or computer software fields.Computer readable recording medium storing program for performing include such as hard disk,
The magnetic mediums such as floppy disk or tape, the optical medium such as compact disk read-only memory (CD-ROM) or digital versatile disc (DVD),
Such as floptical disk magnet-optical medium and the hardware device such as storing and executing ROM, RAM of program command, flash memory.This
Outside, program command includes the machine language code that compiler is formed and the advanced language that computer can perform by using interpretive program
Speech.The hardware device of front can be configured to be operated as at least one software module to execute the operation of the present invention, and inverse
It is also the same to operation.
Although the operation of context of methods has shown and described with particular order, the operation of each method can be changed
Sequentially so that specific operation can be executed with reverse order or allow to execute spy simultaneously with other operations at least partly
Fixed operation.Additionally, this invention is not limited to the above example embodiments, it can be in the premise for not departing from spirit and scope of the present disclosure
Under, including one or more other components or operation, or omit one or more other components or operation.
The preferred embodiment of the present invention is had been combined above and shows the present invention, but those skilled in the art will manage
Solution, without departing from the spirit and scope of the present invention, can carry out various modifications the present invention, replaces and change.Cause
This, the present invention should not be limited by above-described embodiment, and should be limited by appended claims and its equivalent.
Claims (14)
1. a kind of user traffic data processing method, including:
Data cleansing is carried out to generate legitimate traffic data to user traffic data, the legitimate traffic data include click data
With browsing data;
It is generated using the click data and clicks path;
Browse path is generated using the browsing data;And
User's path tree is generated according to the click path and the browse path.
2. according to the method described in claim 1, wherein, the data cleansing includes disabled user ID cleanings, illegal request frequency
One or more of rate is cleaned and blacklist IP address is cleaned.
3. according to the method described in claim 1, wherein, generating click path using the click data includes:
Duplicate removal is carried out to click data;
It will click on data and be processed into URL, the URL of the previous page of the page, the page clicked occur to click for record
The URL and URL of the page redirected after clicking;
Click data is ranked up sequentially in time, by the URL of the URL for the page clicked and the previous page and
The URL of the page redirected after click connects;And
Intermediate two URL without click data in a period of time are directly connected.
4. according to the method described in claim 1, wherein, generating browse path using the browsing data includes:
Extract user's browsing pages URL and browsing time;
User's browsing pages URL is ranked up according to the browsing time;And
User's browsing pages URL connect to generate user's browse path.
5. according to the method described in claim 1, wherein, it includes merging the click path and described clear to generate user's path tree
Look at path.
6. according to the method described in claim 1, wherein, generating user's path tree includes:
Path is will click on according to User ID and browse path is polymerize;
Sequentially in time, to after polymerization click path and browse path be ranked up, generate click data and browsing data
The user's path data being at least partly alternately present;
For the browsing data for losing click data, browsing pages URL is directly connected, and for the point for losing browsing data
Data are hit, then remove the click data, to generate user's path data that click data is alternately present completely with browsing data;With
And
Click data in user's path data is converted into side and browsing data are converted into node, to generate user path
Tree.
7. a kind of user traffic data processing unit, including:
Data cleansing module is configured as carrying out data cleansing to user traffic data to generate legitimate traffic data, the conjunction
Method data on flows includes click data and browsing data;
Path generation module is clicked, is configured as generating using the click data and clicks path;
Browse path generation module is configured as generating browse path using the browsing data;And
User's path tree generation module is configured as generating user's path tree according to the click path and the browse path.
8. device according to claim 7, wherein it is clear that the data cleansing module is additionally configured to execution disabled user ID
Wash, illegal request frequency cleaning and blacklist IP address cleaning one or more of.
9. device according to claim 7, wherein click path generation module is additionally configured to:
Duplicate removal is carried out to click data;
It will click on data and be processed into URL, the URL of the previous page of the page, the page clicked occur to click for record
The URL and URL of the page redirected after clicking;
Click data is ranked up sequentially in time, by the URL of the URL for the page clicked and the previous page and
The URL of the page redirected after click connects;And
Intermediate two URL without click data in a period of time are directly connected.
10. device according to claim 7, wherein the browse path generation module is additionally configured to:
Extract user's browsing pages URL and browsing time;
User's browsing pages URL is ranked up according to the browsing time;And
User's browsing pages URL connect to generate user's browse path.
11. device according to claim 7, wherein user's path tree generation module is additionally configured to described in merging
Click path and the browse path.
12. device according to claim 7, wherein user's path tree generation module is additionally configured to:
Path is will click on according to User ID and browse path is polymerize;
Sequentially in time, to after polymerization click path and browse path be ranked up, generate click data and browsing data
The user's path data being at least partly alternately present;
For the browsing data for losing click data, browsing pages URL is directly connected, and for the point for losing browsing data
Data are hit, then remove the click data, to generate user's path data that click data is alternately present completely with browsing data;With
And
Click data in user's path data is converted into side and browsing data are converted into node, to generate user path
Tree.
13. a kind of electronic equipment, including:
At least one processor;And
The memory being connect at least one processor communication;Wherein,
The memory is stored with the instruction that can be executed by one processor, and described instruction is by least one processor
It executes, so that at least one processor is able to carry out method according to any one of claims 1 to 6.
14. a kind of non-transient computer readable storage medium, which is characterized in that the non-transient computer readable storage medium is deposited
Computer instruction is stored up, the computer instruction is for making the computer perform claim require 1 to 6 any one of them method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710040291.0A CN108322355A (en) | 2017-01-18 | 2017-01-18 | User traffic data processing method, processing unit, electronic equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710040291.0A CN108322355A (en) | 2017-01-18 | 2017-01-18 | User traffic data processing method, processing unit, electronic equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108322355A true CN108322355A (en) | 2018-07-24 |
Family
ID=62891573
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710040291.0A Pending CN108322355A (en) | 2017-01-18 | 2017-01-18 | User traffic data processing method, processing unit, electronic equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108322355A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111698129A (en) * | 2020-06-09 | 2020-09-22 | 湖南大众传媒职业技术学院 | User flow and behavior analysis system |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100169802A1 (en) * | 2006-11-08 | 2010-07-01 | Seth Goldstein | Methods and Systems for Storing, Processing and Managing User Click-Stream Data |
CN103577504A (en) * | 2012-08-10 | 2014-02-12 | 华为技术有限公司 | Method and device for putting personalized contents |
CN103823883A (en) * | 2014-03-06 | 2014-05-28 | 焦点科技股份有限公司 | Analysis method and system for website user access path |
CN104021209A (en) * | 2014-06-19 | 2014-09-03 | 北京博雅立方科技有限公司 | Statistical method for keyword advertising effect and browsing client |
CN104462156A (en) * | 2013-09-25 | 2015-03-25 | 阿里巴巴集团控股有限公司 | Feature extraction and individuation recommendation method and system based on user behaviors |
-
2017
- 2017-01-18 CN CN201710040291.0A patent/CN108322355A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100169802A1 (en) * | 2006-11-08 | 2010-07-01 | Seth Goldstein | Methods and Systems for Storing, Processing and Managing User Click-Stream Data |
CN103577504A (en) * | 2012-08-10 | 2014-02-12 | 华为技术有限公司 | Method and device for putting personalized contents |
CN104462156A (en) * | 2013-09-25 | 2015-03-25 | 阿里巴巴集团控股有限公司 | Feature extraction and individuation recommendation method and system based on user behaviors |
CN103823883A (en) * | 2014-03-06 | 2014-05-28 | 焦点科技股份有限公司 | Analysis method and system for website user access path |
CN104021209A (en) * | 2014-06-19 | 2014-09-03 | 北京博雅立方科技有限公司 | Statistical method for keyword advertising effect and browsing client |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111698129A (en) * | 2020-06-09 | 2020-09-22 | 湖南大众传媒职业技术学院 | User flow and behavior analysis system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020037918A1 (en) | Risk control strategy determining method based on predictive model, and related device | |
US10789311B2 (en) | Method and device for selecting data content to be pushed to terminal, and non-transitory computer storage medium | |
JP6494801B2 (en) | Information recommendation method and apparatus, and server | |
US20190197416A1 (en) | Information recommendation method, apparatus, and server based on user data in an online forum | |
US8751184B2 (en) | Transaction based workload modeling for effective performance test strategies | |
CN105247507B (en) | Method, system and storage medium for the influence power score for determining brand | |
CN108763274B (en) | Access request identification method and device, electronic equipment and storage medium | |
CN106776881A (en) | A kind of realm information commending system and method based on microblog | |
CN111523072A (en) | Page access data statistical method and device, electronic equipment and storage medium | |
CA2396565A1 (en) | System and method for estimating prevalence of digital content on the world-wide-web | |
US20210112101A1 (en) | Data set and algorithm validation, bias characterization, and valuation | |
JP2000011005A (en) | Data analyzing method and its device and computer- readable recording medium recorded with data analytical program | |
CN108304410A (en) | A kind of detection method, device and the data analysing method of the abnormal access page | |
CN111159341B (en) | Information recommendation method and device based on user investment and financial management preference | |
CN105119735B (en) | A kind of method and apparatus for determining discharge pattern | |
CN113221104B (en) | Detection method of abnormal behavior of user and training method of user behavior reconstruction model | |
CN106302350A (en) | URL monitoring method, device and equipment | |
CN109214647B (en) | Method for analyzing overflow effect among online access channels based on network access log data | |
CN110222790A (en) | Method for identifying ID, device and server | |
CN111047448A (en) | Analysis method and device for multi-channel data fusion | |
CN112819528A (en) | Crowd pack online method and device and electronic equipment | |
CN111414410A (en) | Data processing method, device, equipment and storage medium | |
Liu et al. | Forecasting influenza epidemics in Hong Kong using Google search queries data: A new integrated approach | |
Rao et al. | An optimal machine learning model based on selective reinforced Markov decision to predict web browsing patterns | |
CN111160638A (en) | Conversion estimation method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180724 |
|
RJ01 | Rejection of invention patent application after publication |