CN103425661B - A kind of website data is analyzed method and analyzes system - Google Patents
A kind of website data is analyzed method and analyzes system Download PDFInfo
- Publication number
- CN103425661B CN103425661B CN201210151293.4A CN201210151293A CN103425661B CN 103425661 B CN103425661 B CN 103425661B CN 201210151293 A CN201210151293 A CN 201210151293A CN 103425661 B CN103425661 B CN 103425661B
- Authority
- CN
- China
- Prior art keywords
- data stream
- data
- access
- page
- stream
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Abstract
This application provides a kind of website data and analyze method and analysis system, from the angle of data stream, whole network data can be analyzed.Described method includes: by analyzing web site daily record data, it is thus achieved that access data stream, and described access data stream have recorded the order accessing webpage;Rejecting the access data stream not comprising the important page, wherein, the described important page is the page meeting pre defined attribute;Flow to numerous excavation of line frequency to calculate to the remaining access data comprising the important page, obtain the occurrence frequency of first m high access data stream of occurrence frequency and each access data stream;Access data stream for described m, calculate the number of times that the important page occurs in each data stream, and the length of each data stream;Utilize each to access the length of the occurrence frequency of data stream, the number of times that the important page occurs and data stream, calculate described m the water accessing each data stream in data stream.The application instructs the design of website UI by the analysis of website data stream.
Description
Technical field
The application relates to web technology, particularly relates to a kind of website data and analyzes method and analysis system.
Background technology
Website is a three-dimensional system, passes by from a different perspective, and obtain is different results,
Behavior is contained in each click behind.Website user's behavioural analysis just can dissect people by network data
Specific network behavior, discloses people's heart demand, the raising and lowering of website visiting amount, website visiting
Populational subdivision and customer group access be intended to.
For example, having two users, one of them first clicks on " scientific and technological " channel after logging in certain website, with
After click " internet ", another one first clicks " scientific and technological " channel of this website, then clicked on " number
Code ", but only stopped a very short time " digital " and put " internet " immediately.So, certain
In degree there is uniformity in the operating habit of this two users, and can according to their interested content
To judge on certain probability that they are IT industry practitioners.By to classification same many times, just
The general type of website user can be obtained by the analysis to these data.
Additionally, the mouse of user is clicked on can teach that user's regarding on certain webpage in a way
Feel track.Because of the general behavior rule according to people, user can first click on the webpage unit that he notices at first
Element, no matter this element be a button or other.Therefore, summary user's mouse clicked on and analysis
By teaching that vision on a webpage for the user substantially browses track, therefore deduce that one
Whether webpage design is reasonable, if enables to user and really notes and can click website needing to allow
The position that he clicks on, eventually affects the information architecture even website structure of whole website.
But, existing website behavioral data is analyzed, and analytic angle is both for unique user or customer group
Behavioural analysis, these analysis methods are not particularly suited for similar instructing website UI design etc. otherwise
Application, and this kind of application needs the overall angle from website to account for.
Therefore, it is presently required and solves the technical problem that and be: provide a kind of website behavioral data to analyze method,
User behavior data is analyzed by the angle that can stand in whole network data.
Content of the invention
This application provides a kind of website data and analyze method and analysis system, can be from the angle of data stream
Whole network data is analyzed.
In order to solve the problems referred to above, this application discloses a kind of website data and analyze method, comprising:
By analyzing web site daily record data, it is thus achieved that access data stream, wherein, described access data stream record
Access the order of webpage;
Rejecting the access data stream not comprising the important page, wherein, the described important page is predefined for meeting
The page of attribute;
Flow to numerous excavation of line frequency to calculate to the remaining access data comprising the important page, obtain occurrence frequency
First m high access data stream and the occurrence frequency of each access data stream, m is positive integer;
Access data stream for described m, calculate the number of times that the important page occurs in each data stream,
And the length of each data stream;
Each is utilized to access the length of the occurrence frequency of data stream, the number of times that the important page occurs and data stream
Degree, calculates described m the water accessing each data stream in data stream.
Preferably, described method also includes: accesses data stream to described m and carries out water ranking;
And according to the design of described water ranking analysis each block of the page.
Preferably, described by analyzing web site daily record data, it is thus achieved that to access data stream, comprising: by dividing
Analysis web log file data, go out access path from described web log file extracting data;By described access path
Be converted to tree, obtain the tree of access path;The tree of access path described in depth-first traversal,
To access data stream.
Preferably, the page of described pre defined attribute includes: produce the page of feedback behavior;And/or,
The loose-leaf runed.
Preferably, the water of described data stream is directly proportional to the occurrence frequency of data stream, and in data stream
Occur that the number of times of the important page is directly proportional, be inversely proportional to the length of data stream.
Preferably, described m the water accessing each data stream in data stream is calculated, comprising:
Calculate according to below equation:
S=a0+(α·frequency(g)+β·quality(g))/γ·lenth(g);
Wherein, g represents a data stream;
Frequency (g) represents the occurrence frequency of data stream, and α is the factor of influence parameter of frequency (g);
Quality (g) represents the number of times occurring the important page in data stream, and β is the factor of influence of quality (g)
Parameter;
Lenth (g) represents the length of data stream, and γ is the factor of influence parameter of lenth (g);
a0Represent availability of data parameter.
Preferably, take the web log file data in certain time period for the first time and carry out the calculating of data stream water;
Every default time interval, take each Incremental Log data and carry out the calculating of data stream water.
Present invention also provides a kind of website data and analyze system, comprising:
Log analysis module, for by analyzing web site daily record data, it is thus achieved that access data stream, wherein,
Described access data stream have recorded the order accessing webpage;
Data reject module, for rejecting the access data stream not comprising the important page, wherein, described heavy
The page is wanted to be the page meeting pre defined attribute;
Frequently excavate module, for the numerous excavation of line frequency is flow to the remaining access data comprising the important page
Calculate, obtain the occurrence frequency of first m high access data stream of occurrence frequency and each access data stream,
M is positive integer;
Newly-increased index computing module, for accessing data stream for described m, calculates each data stream
The middle number of times that the important page occurs, and the length of each data stream;
Water computing module, for utilizing each to access the occurrence frequency of data stream, the important page occur
Number of times and the length of data stream, calculate described m the high-quality accessing each data stream in data stream
Degree.
Preferably, described system also includes: order module, flows to for accessing data to described m
Row water ranking, and according to the design of described water ranking analysis each block of the page.
Preferably, described log analysis module includes:
Extract submodule, for by analyzing web site daily record data, from described web log file extracting data
Go out access path;
Transform subblock, for described access path is converted to tree, obtains the tree of access path;
Traversal submodule, for the tree of access path described in depth-first traversal, obtains accessing data stream.
Preferably, described water computing module calculates according to below equation:
S=a0+(α·frequency(g)+β·quality(g))/γ·lenth(g);
Wherein, g represents a data stream;
Frequency (g) represents the occurrence frequency of data stream, and α is the factor of influence parameter of frequency (g);
Quality (g) represents the number of times occurring the important page in data stream, and β is the factor of influence of quality (g)
Parameter;
Lenth (g) represents the length of data stream, and γ is the factor of influence parameter of lenth (g);
a0Represent availability of data parameter.
Compared with prior art, the application includes advantages below:
First, from the angle of data stream, to the whole network station, the behavior of all users is analyzed the application, rather than
The analysis of the special behavior of single minority.Further, instruct website UI's by the analysis of website data stream
Design, instructs the work of website operation personnel.
Secondly, the behavioral data mark classification that the application accesses user website, weeds out nonsensical
Part data, make target data set be reduced at least one or more the order of magnitude, alleviate amount of calculation.
Again, the application adds two indices during the calculating of data stream water: data stream
Length and data stream occur the number of times of the important page, relatively rapid, data stream accurately can be found,
Avoid the customer loss that long data conductance causes.
Finally, due to after weeding out batch of data, data volume has the minimizing of magnitude, and some application are more
Concern incremental data, data set before is had no effect by incremental data, simply at result set before
On the basis of do the operation increasing data, so recalculating without full dose data, the data volume therefore calculating
Less, real-time data analysis is realized with regard to this.
Certainly, the arbitrary product implementing the application is not necessarily required to reach all the above excellent simultaneously
Point.
Brief description
Fig. 1 is the flow chart that a kind of website data described in the embodiment of the present application analyzes method;
Fig. 2 is the structural representation of the tree of access path in the embodiment of the present application;
Fig. 3 is the flow chart that a kind of website data described in another embodiment of the application analyzes method;
Fig. 4 is the structure chart that a kind of website data described in the embodiment of the present application analyzes system.
Detailed description of the invention
Understandable, below in conjunction with the accompanying drawings for enabling the above-mentioned purpose of the application, feature and advantage to become apparent from
With detailed description of the invention, the application is described in further detail.
The application introduces the concept of data stream, from the angle of data stream to the whole network station the behavior of all users enter
Row is analyzed.And, calculated by optimizing frequent subtree, relatively rapid, data stream accurately can be found.
Below by embodiment flow process is described in detail to be realized to herein described method.
It with reference to shown in Fig. 1, is the flow chart that a kind of website data described in the embodiment of the present application analyzes method.
Step 101, by analyzing web site daily record data, it is thus achieved that access data stream, wherein, described access
Data stream have recorded the order accessing webpage;
Accessing data stream and referring to that user accesses the sequencing of webpage, the access data stream such as certain user is
A → C → F, i.e. this user first access webpage A, then jump to webpage C from webpage A and conduct interviews,
Jump to webpage F from webpage C again, be consequently formed one and access data stream (being called for short data stream).Wherein,
A, C, F can be described as accessing the page node of data stream or abbreviation node.
Can be obtained by analyzing web site daily record data and access data stream, the present embodiment will be enumerated a kind of acquisition and visit
Ask the mode of data stream, but the protection domain of the application should not be limited to this.
Specifically include following sub-step:
Sub-step 1, by analyzing web site daily record data, goes out to access from described web log file extracting data
Path;
Web log file have recorded the behavioral data that user website accesses, so passing through analyzing web site daily record number
According to, it is possible to obtain which webpage a user have accessed within a period of time, forms the access of this user
Path.Certainly, a user can have a plurality of access path.
Described access path is converted to tree by sub-step 2, obtains the tree of access path;
Described tree is the data knot of the tree-shaped relation with root node, child node and leaf node
Structure.A kind of conversion regime is set forth below, but the protection domain of the application should not be limited to this.
For example, the access according to each step of user, the current step of record and source step (i.e. previous step), often
(a, b), wherein a is the currently accessed page to secondary a pair data splitting that be recorded as obtaining, and b is next
The source page (i.e. goes up a page), so records the access situation of user, thus draws following record:
(a, b), (a, c), (c, d), (d, e), (a, e), (h, g), each of which is all an access path to data splitting.
According to these access path, tree as shown in Figure 2 can be drawn.
Sub-step 3, the tree of access path described in depth-first traversal, obtain accessing data stream.
In computer science, traversal of tree refers to access one in a certain order by a kind of method
The process of tree.For binary tree, traversal of tree generally has four kinds: preorder traversal, inorder traversal, postorder
Traversal, breadth first traversal.Wherein, first three is referred to as depth-first traversal.For multiway tree, tree
Traversal generally has two kinds: depth-first traversal, breadth first traversal.
In the present embodiment, depth-first traversal one tree is used to can get all access data of this tree
Stream.Wherein, every accesses data stream is all a complete stream, is i.e. all the stream starting from root node.
And, a data stream is to be produced by a user, and a user can produce many data stream.
For example, depth-first traversal Fig. 2, available 4 data streams, it is respectively as follows:
a→b;
a→c→d→e;
a→e;
h→g。
Step 102, rejects the access data stream not comprising the important page, and wherein, the described important page is
Meet the page of pre defined attribute;
Wherein, the page of described pre defined attribute includes:
Produce the page of feedback behavior, i.e. have the page of feedback;
And/or, the loose-leaf runed.
Wherein, the feedback behavior producing in the page have feedback specifically may include that and places an order, to seller
Online message, click on seller contact method, click on seller commercial number, click on seller trade lead to,
Click is signed a contract and is put on record.
Compared with prior art, the improvement that the embodiment of the present application is done is: the data that user is produced
Flow point is two classes, has feedback flow and feedback-less stream.Visiting owing to data analysis process being concerned with user
The feedback producing during asking website, this type of data are focal points, by the extreme saturation of tree, mark
Keep those operations stream (data stream) to have in tree feedback step in mind.The target data thus analyzed is main
It is to have feedback flow.
Specifically, during the extreme saturation of tree, can there is the page of feedback to these or run
Loose-leaf tagged, identifying these pages is the important page meeting pre defined attribute, with display
Its particularity.Then, the access data stream obtaining extreme saturation does rejecting operation, if certain data
Stream does not has the important page that any one labels, then rejects this stream.
This rejecting operation can weed out the part data nonsensical to data analysis, so that mesh
Mark data set minimizing one or more the order of magnitude.
The remaining access data comprising the important page are flow to numerous excavation of line frequency and calculate by step 103,
First m the access data stream high to occurrence frequency and the occurrence frequency of each access data stream, m is for just
Integer;
The occurrence frequency of data stream refers to the occurrence number of data stream, if party A-subscriber produces a data stream
A → C → F, party B-subscriber also produces same data stream A → C → F, then the appearance frequency of this data stream
Degree is 2.
This step employs the digging technology of frequent subtree, for the remaining access number comprising the important page
According to stream, according to the frequent mining algorithm of any one of the prior art, recursive calculation goes out occurrence frequency sequence
Forward m frequent subtree (i.e. data stream), and the occurrence frequency of every data stream.Wherein, m
Value can determine according to actual needs.For the relatively low data stream of occurrence frequency, then at subsequent step
Calculating in do not consider further that, to reduce calculating data volume.
It should be noted that frequently excavate the data stream obtaining to be different from the data that depth-first traversal obtains
Stream.As an example it is assumed that the data stream that depth-first traversal obtains is A → B → C → D → E,
This data stream comprises the important page meeting pre defined attribute, then flow to this data that line frequency is numerous to be passed
Calculated data stream packets is returned to include: B → C → D → E, C → D → E, D → E.
Based on this, a kind of method that by frequent subtree be calculated data stream is set forth below, as follows:
Such as A → B → C → D → E, A1 → B → C → D → E1, A → B → C → D → E2,
The such four kinds of streams of A2 → B → C1 → D → E1, it is assumed that wherein all comprise the predefined attribute page, then pass
Returning B → C → D occurrence frequency in calculated data stream to be 3, A → B → C → D occurrence frequency is
2, D → E1 occurrence frequency is 2, remaining be all 1 (in this result, be not required to consider B → C, C → D,
Because the occurrence frequency of the two data stream itself is all 3, the two data stream is same already contained in frequency
Be 3 B → C → D suffer), then if desired take the subtree of before occurrence frequency 3, be then B → C → D,
A → B → C → D, D → E1.
Step 104, accesses data stream for described m, calculates in each data stream and the important page occur
Number of times, and the length of each data stream;
Wherein, occur that the number of times of the important page refers to contain in a data stream several important page, example
As: data stream A → B → C → D → E, wherein webpage B and D is the important page, then in this data stream
The number of times important page occur is 2.
The length of described data stream refers to the page nodes comprising in a data stream, such as data stream
A length of the 5 of A → B → C → D → E.
Step 105, utilizes each to access the occurrence frequency of data stream, the number of times sum important page occur
According to the length of stream, calculate described m the water accessing each data stream in data stream.
Existing frequent subtree algorithm, mostly only focuses on the occurrence frequency of frequent subtree (data stream), is
No higher than certain threshold value, and do not consider other influences factor.During the analysis of the embodiment of the present application, only
Occurrence frequency can not meet demand, increases the index of two other final water score of impact, including
The quality of the scale (nodes of subtree) of frequent subtree (data stream) and subtree (important joint in subtree
The number of point), totally three indexs finally calculate the high-quality of topN (sort forward N number of data stream)
Degree score S.This point is also that the embodiment of the present application is to one of improvement that existing frequent subtree calculates.
Wherein, being described below of each index:
1) frequency that subtree occurs, the also referred to as occurrence frequency of data stream, its corresponding factor of influence is joined
Number is α, S and α direct proportionality;
2) quality of subtree, also referred to as data stream there is the number of times of the important page, its corresponding impact
Factor parameter is β, S and β direct proportionality;
3) scale of subtree, the also referred to as length of data stream, its corresponding factor of influence parameter is γ, S
The inversely proportional relation with γ.
Described factor of influence parameter alpha, β and γ, can roots it can be appreciated that the corresponding weight of three indexs
Border application scenarios sets factually.
From the foregoing, it will be observed that water score S of described data stream not only just becomes with the occurrence frequency of data stream
Ratio, also to data stream occurring, the number of times of the important page is directly proportional, and is inversely proportional to the length of data stream.
In other words, it is not that the data stream that occurrence frequency described in prior art is high is optimum, if data
The length of stream is oversize, also can traffic impacting water.
Based on this positive inverse relation, the present embodiment enumerates a kind of computing formula, but the protection of the application is anti-
Should not be limited to this again, specific as follows:
S=a0+(α·frequency(g)+β·quality(g))/γ·lenth(g);
Wherein, g represents a data stream;
Frequency (g) represents the occurrence frequency of data stream, and α is the factor of influence parameter of frequency (g);
Quality (g) represents the number of times occurring the important page in data stream, and β is the factor of influence of quality (g)
Parameter;
Lenth (g) represents the length of data stream, and γ is the factor of influence parameter of lenth (g);
a0Expression availability of data parameter, for example: calculate the water of data stream in certain shopping website,
The availability of this website data at non-weekend is better than the availability of data at weekend, therefore for data at non-weekend
Different a can be set with data at weekend0Value.
For example:
Following data stream is data stream to be analyzed, and wherein H and D is the page producing feedback behavior:
1)A→B→D→F→H
2)A→C→H
3)A→B→D→F→H→A→B→H
Then analysis result is as follows:
A number stream occurrence number is 2, and feedback node number is 2, a length of the 5 of stream;
No. two stream occurrence numbers are 1, and feedback node number is 1, a length of the 3 of stream;
No. three stream occurrence numbers are 1, and feedback node number is 3, a length of the 8 of stream.
Investigating according to Primary Stage Data, providing each weight α=2.5, β=4, (concrete setting can in γ=1
Data investigation and analysis according to early stage, or specifically demand sets applicable value).
To sum up draw: S1=(2.5*2+4*2)/1*5=2.6
S2=(2.5*1+4*1)/1*3=2.2
S3=(2.5*1+4*3)/1*8=1.8125
Thus show that a stream is data stream optimum in three kinds of data streams.
In sum, above-described embodiment provide website data analyze method, prior art is made that with
Some improvement lower:
First, existing website data is analyzed method and is less used this global concept of data stream to analyze
User behavior, and prior art is the analysis for certain particular user;And the embodiment of the present application is from number
Being analyzed website data according to the angle of stream, to the whole network station, the behavior of all users is analyzed, rather than
The analysis of the special behavior of single minority.
Second, existing frequent subtree algorithm does not does first run screening on initial result set and rejects;And this
The behavioral data mark classification that application embodiment accesses user website, weeds out nonsensical part
Data, this process makes target data set be reduced at least one or more the order of magnitude, alleviates amount of calculation;
3rd, existing frequent subtree algorithm only focuses on the occurrence number of subtree mostly, and does not considers other
Influence factor;And the embodiment of the present application adds two indices during the calculating of data stream water,
Including (there is the important page in the quality of the scale of subtree (length of data stream) and subtree in data stream
Number of times), relatively rapid, data stream accurately can be found, it is to avoid the user that long data conductance causes
Run off.
Based on Fig. 1 embodiment, below in conjunction with website UI design, carried out in more detail by Fig. 3 embodiment
Ground explanation.In Fig. 3 embodiment, water ranking can be carried out to accessing data stream, and according to described excellent
The design of matter degree ranking analysis each block of the page.
It with reference to shown in Fig. 3, is the flow process that a kind of website data described in another embodiment of the application analyzes method
Figure.
Wherein, step 201a and step 201b can executed in parallel, it is possible to perform according to sequencing, and
And both sequencing interchangeable.Shown in Fig. 2 is to first carry out step 201b after step 201a
Situation.
Step 201a, defines some specialized page;
Access path, to daily record data process, is converted to tree by step 201b;
Described specialized page specifically mays include: the loose-leaf runed, the page having feedback.
Step 202, the tree of depth-first traversal access path, draw all of access data stream;
Step 203, does rejecting operation, it is judged that whether comprise specialized page in data stream to above-mentioned data stream;
Such as the specialized page defining before without reference to any one in certain stream, then reject this data
Stream, the process of this data stream terminates.In actual application, this operation can remove about 70% useless
Data, decrease data set to be analyzed.
Step 204, to remaining data stream, according to the frequent algorithm excavating, recursive calculation goes out topM (M
Can draft according to demand) frequent subtree (i.e. data stream), draw data stream and the existing frequency of this outflow;
Wherein, can sort the frequent subtree selecting high front M the data stream of occurrence frequency as topM.
Step 205, calculates the number of times occurring the important page in each data stream;
Step 206, calculates the length of each data stream;
Step 207, investigates according to Primary Stage Data and analyzes, drawing the weight of each index, and draw one
The computing formula of individual data stream water;
Step 208, brings each index and weight into computing formula, draws every water flowing;
Step 209, according to the ranking of water, analyzes the design of each block of the page.
Concrete, can pass through to analyze each block design precisely effective linked contents at the page, thus
Do the guiding of optimum access path to each visiting subscriber, reduce churn rate, improve feedback rates.
For example, selecting an optimum data stream A → B → C, certain the eye-catching block at website homepage sets
The linked contents of meter webpage A, at the linked contents of the eye-catching block design webpage B of webpage A, at net
The linked contents of eye-catching block design webpage C of page B, thus guide user open every time one new
Webpage, can find, in region the most eye-catching, the link oneself desiring access to and click on.
Additionally, in above process, the web log file data that can also take for the first time in certain time period are carried out
Data stream water calculates, and then every default time interval, takes each Incremental Log data and carries out
Data stream water calculates.
For example, the full dose data choosing certain time period region for the first time are analyzed calculating, follow-up from step
Visitor's behavioral data that every day, website newly gathered can be analyzed calculating by rapid 202~209.Further, from step
Newly-increased data can also be done once in every 5 minutes by rapid 202~209, and the concrete time period can be according to demand from plan.
Rejecting operation due to step 203, it is ensured that the minimizing of incremental data set magnitude, and then ensure that in real time
The data of increment are carried out identical analysis by feasibility every time that calculate, and superposition enters final result collection,
Realize analyzing in real time.
In sum, Fig. 3 embodiment not only has several advantages of Fig. 1 embodiment, also to prior art
It is made that following improvement:
First, the behavioral data analysis of existing website is all the hobby analyzing unique user, comes to special
User carry out the recommendation of specific single or multiple commodity, the algorithm of excavation uses the angle at commodity,
And seldom in view of the global design of page block, the work of website operation personnel is also simply runing this page
Face, seldom considers the process streams from other association pages to the operation page.And the embodiment of the present application is by website
The analysis of data stream is applied in the UI design of website, can find relatively rapid, data stream accurately,
Instruct the design of website UI, instruct the work of website operation personnel.
Second, prior art, due to the big problem of data volume, seldom relates to analyzing in real time.And the application is real
Execute the classification by initial data all to website for the example, after weeding out the data being not concerned with, available website number
According to the little part only accounting for all website datas, can be by real-time (such as every 5 minutes) increment (variable quantity)
New data, join in data set, incremental data is analyzed calculate.Accordingly, because pretreatment
After batch of data, data volume has the minimizing of magnitude, and incremental data, increment number are more paid close attention in some application
Having no effect according to data set before, simply doing on the basis of result set before increases the behaviour of data
Making, so recalculating without full dose data, the data volume therefore calculating is less, realizes real with regard to this
When data analysis.
Above-described embodiment is to illustrate as a example by website UI design, but also can be by net in concrete application
The analysis of data of standing stream is applied to other aspects, and it is similar to the aforementioned embodiment that it implements principle, therefore no longer superfluous
State.
It should be noted that for aforesaid embodiment of the method, in order to be briefly described, therefore it is all stated
For a series of combination of actions, but those skilled in the art should know, the application is not by described
The restriction of sequence of movement because according to the application, some step can use other orders or simultaneously
Carry out.Secondly, those skilled in the art also should know, embodiment described in this description belongs to
Preferred embodiment, necessary to involved action not necessarily the application.
Based on the explanation of said method embodiment, present invention also provides corresponding website data and analyze system
Embodiment.
It with reference to Fig. 4, is the structure chart that a kind of website data described in the embodiment of the present application analyzes system.
Described website data is analyzed system and specifically can be included with lower module:
Log analysis module 10, for by analyzing web site daily record data, it is thus achieved that access data stream, wherein,
Described access data stream have recorded the order accessing webpage;
Data reject module 20, for rejecting the access data stream not comprising the important page, wherein, described
The important page is the page meeting pre defined attribute;
Frequently excavating module 30, for flowing to the remaining access data comprising the important page, line frequency is numerous to be dug
Pick calculates, and obtains first m high access data stream of occurrence frequency and each accesses the appearance of data stream frequently
Degree, m is positive integer;
Newly-increased index computing module 40, for accessing data stream for described m, calculates each data
Stream occurs the number of times of the important page, and the length of each data stream;
Water computing module 50, for utilizing each to access the occurrence frequency of data stream, important page occur
The number of times in face and the length of data stream, calculate described m and access the excellent of each data stream in data stream
Matter degree.
Wherein, the page of described pre defined attribute includes: produce the page of feedback behavior;And/or, just
Loose-leaf in operation.
Preferably, described log analysis module 10 specifically can include following submodule:
Extract submodule, for by analyzing web site daily record data, from described web log file extracting data
Go out access path;
Transform subblock, for described access path is converted to tree, obtains the tree of access path;
Traversal submodule, for the tree of access path described in depth-first traversal, obtains accessing data stream.
Wherein, the water of described data stream is directly proportional to the occurrence frequency of data stream, goes out in data stream
The number of times of the existing important page is directly proportional, and is inversely proportional to the length of data stream.
In one embodiment, based on described positive inverse relation, described water computing module 50 can be according to
Below equation calculates:
S=a0+(α·frequency(g)+β·quality(g))/γ·lenth(g);
Wherein, g represents a data stream;
Frequency (g) represents the occurrence frequency of data stream, and α is the factor of influence parameter of frequency (g);
Quality (g) represents the number of times occurring the important page in data stream, and β is the factor of influence of quality (g)
Parameter;
Lenth (g) represents the length of data stream, and γ is the factor of influence parameter of lenth (g);
a0Represent availability of data parameter.
Preferably, above-mentioned website data is analyzed system and can be carried out data and analyze in real time, and concrete mode is:
Take the web log file data in certain time period for the first time and carry out the calculating of data stream water;
Every default time interval, take each Incremental Log data and carry out the calculating of data stream water.
Preferably, in one embodiment, in described website data Application of analysis system to UI can being designed,
Therefore described system can also include with lower module:
Order module 60, carries out water ranking for accessing data stream to described m, and according to institute
State the design of water ranking analysis each block of the page.
Analyze for system embodiment for above-mentioned website data, due to the basic phase of itself and embodiment of the method
Seemingly, so describe is fairly simple, related part sees the part of embodiment of the method shown in Fig. 1 and Fig. 3
Illustrate.
In sum, described website data analysis system has the advantage that
First, website data is analyzed by the application from the angle of data stream, all users to the whole network station
Behavior be analyzed, rather than the analysis of the special behavior of single minority.Further, by website data stream
Analyze the design instructing website UI, instruct the work of website operation personnel.
Secondly, the behavioral data mark classification that the application accesses user website, weeds out nonsensical
Part data, make target data set be reduced at least one or more the order of magnitude, alleviate amount of calculation.
Again, the application adds two indices during the calculating of data stream water: data stream
Length and data stream occur the number of times of the important page, relatively rapid, data stream accurately can be found,
Avoid the customer loss that long data conductance causes.
Finally, due to after weeding out batch of data, data volume has the minimizing of magnitude, and some application are more
Concern incremental data, data set before is had no effect by incremental data, simply at result set before
On the basis of do the operation increasing data, so recalculating without full dose data, the data volume therefore calculating
Less, real-time data analysis is realized with regard to this.
Each embodiment in this specification all uses the mode gone forward one by one to describe, and each embodiment stresses
Be all the difference with other embodiments, between each embodiment, identical similar part sees mutually
?.
Above " and/or " represent both contained herein " and " relation, also contains " or "
Relation, wherein: if option A and option b be " and " relation, then it represents that can in certain embodiment
To include option A and option b simultaneously;If option A and option b be " or " relation, then table
Show and certain embodiment can individually include option A, or individually include option b.
Those skilled in the art it should be appreciated that embodiments herein can be provided as method, system or
Computer program.Therefore, the application can use complete hardware embodiment, complete software implementation,
Or the form of the embodiment in terms of combining software and hardware.And, the application can use one or more
Wherein include computer-usable storage medium (the including but not limited to disk of computer usable program code
Memory, CD-ROM, optical memory etc.) form of the upper computer program implemented.
The application is with reference to the method according to the embodiment of the present application, equipment (system) and computer program
The flow chart of product and/or block diagram describe.It should be understood that flow process can be realized computer program instructions
Stream in each flow process in figure and/or block diagram and/or square frame and flow chart and/or block diagram
Journey and/or the combination of square frame.These computer program instructions can be provided to all-purpose computer, dedicated computing
The processor of machine, Embedded Processor or other programmable data processing device, to produce a machine, makes
The instruction that must be performed by the processor of computer or other programmable data processing device is produced in fact
Present one flow process of flow chart or multiple flow process and/or one square frame of block diagram or multiple square frame are specified
The device of function.
These computer program instructions may be alternatively stored in and can guide computer or other programmable data process
In the computer-readable memory that equipment works in a specific way so that be stored in the storage of this computer-readable
Instruction in device produces the manufacture including command device, and this command device realizes in one flow process of flow chart
Or the function specified in multiple flow process and/or one square frame of block diagram or multiple square frame.
These computer program instructions also can be loaded in computer or other programmable data processing device,
Make on computer or other programmable devices, perform sequence of operations step to realize to produce computer
Process, thus on computer or other programmable devices perform instruction provide for realize in flow process
The function specified in one flow process of figure or multiple flow process and/or one square frame of block diagram or multiple square frame
Step.
Analyze method above to a kind of website data provided herein and analyze system, having carried out in detail
Introducing, principle and embodiment to the application for the specific case used herein is set forth, above
The explanation of embodiment is only intended to help and understands the present processes and core concept thereof;Simultaneously for this
The those skilled in the art in field, according to the thought of the application, in specific embodiments and applications all
Will change, in sum, this specification content should not be construed as the restriction to the application.
Claims (11)
1. a website data analyzes method, it is characterised in that include:
By analyzing web site daily record data, it is thus achieved that access data stream, wherein, described access data stream record
Access the order of webpage;
Rejecting the access data stream not comprising the important page, wherein, the described important page is predefined for meeting
The page of attribute;
Flow to numerous excavation of line frequency to calculate to the remaining access data comprising the important page, obtain occurrence frequency
First m high access data stream and the occurrence frequency of each access data stream, m is positive integer;
Access data stream for described m, calculate the number of times that the important page occurs in each data stream,
And the length of each data stream;
Each is utilized to access the length of the occurrence frequency of data stream, the number of times that the important page occurs and data stream
Degree, calculates described m the water accessing each data stream in data stream.
2. method according to claim 1, it is characterised in that also include:
Access data stream to described m and carry out water ranking;And
Design according to described water ranking analysis each block of the page.
3. method according to claim 1, it is characterised in that described by analyzing web site daily record
Data, it is thus achieved that access data stream, comprising:
By analyzing web site daily record data, go out access path from described web log file extracting data;
Described access path is converted to tree, obtains the tree of access path;
The tree of access path described in depth-first traversal, obtains accessing data stream.
4. method according to claim 1, it is characterised in that the page of described pre defined attribute
Including:
Produce the page of feedback behavior;
And/or, the loose-leaf runed.
5. method according to claim 1, it is characterised in that:
The water of described data stream is directly proportional to the occurrence frequency of data stream, with data stream in occur important
The number of times of the page is directly proportional, and is inversely proportional to the length of data stream.
6. method according to claim 5, it is characterised in that calculate described m and access data
The water of each data stream in stream, comprising:
Calculate according to below equation:
S=a0+(α·frequency(g)+β·quality(g))/(γ·lenth(g));
Wherein, g represents a data stream;
Frequency (g) represents the occurrence frequency of data stream, and α is the factor of influence parameter of frequency (g);
Quality (g) represents the number of times occurring the important page in data stream, and β is the factor of influence of quality (g)
Parameter;
Lenth (g) represents the length of data stream, and γ is the factor of influence parameter of lenth (g);
a0Represent availability of data parameter.
7. method according to claim 1, it is characterised in that:
Take the web log file data in certain time period for the first time and carry out the calculating of data stream water;
Every default time interval, take each Incremental Log data and carry out the calculating of data stream water.
8. a website data analyzes system, it is characterised in that include:
Log analysis module, for by analyzing web site daily record data, it is thus achieved that access data stream, wherein,
Described access data stream have recorded the order accessing webpage;
Data reject module, for rejecting the access data stream not comprising the important page, wherein, described heavy
The page is wanted to be the page meeting pre defined attribute;
Frequently excavate module, for the numerous excavation of line frequency is flow to the remaining access data comprising the important page
Calculate, obtain the occurrence frequency of first m high access data stream of occurrence frequency and each access data stream,
M is positive integer;
Newly-increased index computing module, for accessing data stream for described m, calculates each data stream
The middle number of times that the important page occurs, and the length of each data stream;
Water computing module, for utilizing each to access the occurrence frequency of data stream, the important page occur
Number of times and the length of data stream, calculate described m the high-quality accessing each data stream in data stream
Degree.
9. system according to claim 8, it is characterised in that also include:
Order module, for carrying out water ranking to described m access data stream, and according to described
The design of water ranking analysis each block of the page.
10. system according to claim 8, it is characterised in that described log analysis module includes:
Extract submodule, for by analyzing web site daily record data, from described web log file extracting data
Go out access path;
Transform subblock, for described access path is converted to tree, obtains the tree of access path;
Traversal submodule, for the tree of access path described in depth-first traversal, obtains accessing data stream.
11. system according to claim 8, it is characterised in that
Described water computing module calculates according to below equation:
S=a0+(α·frequency(g)+β·quality(g))/(γ·lenth(g));
Wherein, g represents a data stream;
Frequency (g) represents the occurrence frequency of data stream, and α is the factor of influence parameter of frequency (g);
Quality (g) represents the number of times occurring the important page in data stream, and β is the factor of influence of quality (g)
Parameter;
Lenth (g) represents the length of data stream, and γ is the factor of influence parameter of lenth (g);
a0Represent availability of data parameter.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210151293.4A CN103425661B (en) | 2012-05-15 | 2012-05-15 | A kind of website data is analyzed method and analyzes system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210151293.4A CN103425661B (en) | 2012-05-15 | 2012-05-15 | A kind of website data is analyzed method and analyzes system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103425661A CN103425661A (en) | 2013-12-04 |
CN103425661B true CN103425661B (en) | 2016-10-05 |
Family
ID=49650419
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201210151293.4A Active CN103425661B (en) | 2012-05-15 | 2012-05-15 | A kind of website data is analyzed method and analyzes system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103425661B (en) |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106484717B (en) * | 2015-08-27 | 2019-12-10 | 北京国双科技有限公司 | Data profiling method and device for path navigation |
CN108121749A (en) * | 2016-11-30 | 2018-06-05 | 北京国双科技有限公司 | Website user's behavior analysis method and device |
CN108241704B (en) * | 2016-12-26 | 2021-09-17 | 北京国双科技有限公司 | Data processing method and device |
CN110020074B (en) * | 2017-10-13 | 2021-04-23 | 北京国双科技有限公司 | Method and device for determining webpage loss rate |
CN108900520B (en) * | 2018-07-11 | 2021-04-20 | 广州虎牙信息科技有限公司 | Live broadcast card pause factor determination method and device, server and storage medium |
CN111611508B (en) * | 2020-05-28 | 2020-12-15 | 江苏易安联网络技术有限公司 | Identification method and device for actual website access of user |
CN113692014B (en) * | 2021-08-30 | 2023-10-27 | 中国平安人寿保险股份有限公司 | APP flow analysis method, apparatus, computer device and storage medium |
CN116775148B (en) * | 2023-06-19 | 2024-02-09 | 深圳市秦丝科技有限公司 | Small program optimization management system and method based on data analysis technology |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102184230A (en) * | 2011-05-11 | 2011-09-14 | 北京百度网讯科技有限公司 | Method and device for displaying search results |
CN102306171A (en) * | 2011-08-22 | 2012-01-04 | 百度在线网络技术(北京)有限公司 | Method and equipment for providing network access suggestions and network search suggestions |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CA2579691A1 (en) * | 2004-09-16 | 2006-03-30 | Telenor Asa | A method, system, and computer program product for searching for, navigating among, and ranking of documents in a personal web |
-
2012
- 2012-05-15 CN CN201210151293.4A patent/CN103425661B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102184230A (en) * | 2011-05-11 | 2011-09-14 | 北京百度网讯科技有限公司 | Method and device for displaying search results |
CN102306171A (en) * | 2011-08-22 | 2012-01-04 | 百度在线网络技术(北京)有限公司 | Method and equipment for providing network access suggestions and network search suggestions |
Also Published As
Publication number | Publication date |
---|---|
CN103425661A (en) | 2013-12-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103425661B (en) | A kind of website data is analyzed method and analyzes system | |
Liu et al. | Coreflow: Extracting and visualizing branching patterns from event sequences | |
Heo et al. | Evolution of the linkage structure of ICT industry and its role in the economic system: The case of Korea | |
US9575950B2 (en) | Systems and methods for managing spreadsheet models | |
Rajapakse et al. | An investigation of cloning in web applications | |
Zhang et al. | Characterizing scientific production and consumption in physics | |
CN107748752A (en) | A kind of data processing method and device | |
Oliveira Junior et al. | Systematic evaluation of software product line architectures | |
Faber et al. | An Agile Framework for Modeling Smart City Business Ecosystems. | |
Pflanzl et al. | Human-oriented challenges of social BPM: an overview | |
Batabyal et al. | Creative capital, information and communication technologies, and economic growth in smart cities | |
CN108536700A (en) | A kind of method that nothing buries a collector journal | |
CN108920147A (en) | A kind of Web page construction method, calculates equipment and storage medium at device | |
Alizadeh et al. | Linear time optimal approaches for reverse obnoxious center location problems on networks | |
Bhosale et al. | Role of business intelligence in digital marketing | |
US20150032685A1 (en) | Visualization and comparison of business intelligence reports | |
Du et al. | Servicification and global value chain upgrading: empirical evidence from China’s manufacturing industry | |
CN114511353A (en) | Data analysis method and device | |
Altarturi et al. | Review of knowledge framework and conceptual structure of Islamic Banking | |
Saha et al. | A web-based integrated environment for simulation and analysis with NS-2 | |
Orlovskyi et al. | Enterprise architecture modeling support based on data extraction from business process models | |
Biermann et al. | Parallel independence of amalgamated graph transformations applied to model transformation | |
CN107145508A (en) | Website data processing method, device and system | |
Nathanael et al. | Study of algorithmic method and model for effort estimation in big data software development case study: Geodatabase | |
CN115409541A (en) | Cigarette brand data processing method based on data blood relationship |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant |