A kind of method of broken line disturbed deviation accumulation relative analyses similarity degree
Technical field
The present invention is with regard to behavior association analysiss business scope, particularly to a kind of broken line disturbed deviation accumulation relative analyses
The method of similarity degree.
Background technology
In behavior association analysiss business, the sales achievement of certain commodity of analysis enterprise(The sale specified number of sales behavior
According to), analyze the change of daily time period sales peak, the source of customers of certain several commodity of analysis enterprise(Sales behavior
Source data)Similarity degree, and then the association consumption habit of client is analyzed, provides for follow-up marketing policy and instruct,
It can be seen that this similarity degree analysis work has very important effect.Typically in behavior association analysiss business, can be according to not
With the data distribution characteristic of behavior, data is obtained broken line chart through collecting.
Broken line:X-axis has unit, and each unit has a sampled point, and each sampled point has value in Y-axis.One
As common application scenarios be:(1)X-axis is the time, and unit is the second;(2)Y-axis is quantity, and unit is secondary;(3)One sampled point
(x,y)Represent within time period second time x, that is, be more than or equal to x second time point, less than x+1 second time point, certain event occurs altogether
Meter y time;
First send out broken line, send out broken line afterwards:First send out broken line and represent the corresponding event of this broken line it should occur front.After send out broken line
Represent the corresponding event of this broken line it should occur rear.
Coupling:For first sending out certain event of broken line, from all events of rear broken line, deviate window according to allowing distribution
Rule, finds an event and is matched;First send out certain particular event of broken line, at most can only be with one of rear broken line thing
Part is matched;After send out one of broken line event, at most can only first be sent out one of broken line event and be matched.
Distribution is allowed to deviate window:It is assumed that it is N that distribution deviates window size, between two broken lines LineA, LineB,
Carry out during Coupling Degrees it is allowed to first send out certain sampled point of broken line(Ax,Ay)Corresponding Ay event, in LineB(N+
1)Individual sampled point carries out coupling association, and in LineB, corresponding time segment limit is x, x+1, x+2 ..., x+N.
The degree of coupling:Between two broken lines LineA, LineB, if two broken lines are completely superposed, i.e. the value of sampled point
Identical, the degree of coupling of this kind of situation is necessarily unity couping;If the corresponding each event of each sampled point in LineA, can
Enough corresponding events several times of sampled point in the range of the time period allowing distribution to deviate the corresponding LineB of window, obtain only
The corresponding event of one coupling, and each event at each the sampling o'clock in final LineB, have all been matched correspondence, then
Article two, the degree of coupling between broken line is unity couping;During unity couping, overlapping index should reach highest.
Irrelevance:The order of severity being the failure to enough acquisition pairings of irrelevance explanation, during unity couping, irrelevance index should
For 0 it is impossible to mate is more, irrelevance index should be higher.
Directly deviate:Correspondence collects the same time period of broken line, the absolute value of the difference of Y-axis value of two sampled points.
Variance deviates:Correspondence collects the same time period of broken line, its power directly deviateing.
At present in IT industry, the method and technology that the analogous relationship for solving above-mentioned analysis system is analyzed quite lacks,
The method proposing to enable the similarity degree of analytical behavior association, and by completing formal technical products through engineering practice, tool
Have broad application prospects.
Content of the invention
Present invention is primarily targeted at overcoming deficiency of the prior art, provide one kind according to behavior association analysiss business
In the broken line chart that automatically generates of data message, by being analyzed to the degree of coupling of two broken lines, irrelevance, can obtain
The degree of coupling of quantization, the method for the broken line disturbed deviation accumulation relative analyses similarity degree of irrelevance index.For solving above-mentioned skill
Art problem, the solution of the present invention is:
There is provided a kind of method of broken line disturbed deviation accumulation relative analyses similarity degree, first, according to behavior association analysiss
Data message is extracted in the data distribution characteristic subordinate act association analysiss Service Database of the requirement of business and different behavior, and base
Automatically generate broken line chart in these data messages;
Then it is assumed that two broken lines in chart are respectively LineA and LineB, wherein LineA is first to send out broken line, LineB
For rear broken line, the sampled point on broken line(x,y), X-axis is the time, and unit is the time cycle specified(Such as:1 second, 15 minutes
Etc.), Y-axis is quantity, and unit is secondary, represents within the x time cycle, and event frequency is y time, the X of the sampled point of LineA
Axle is interval to be [AXmin, AXmax], and the X-axis of the sampled point of LineB is interval to be [BXmin, BXmax];
The method of broken line disturbed deviation accumulation relative analyses similarity degree, obtains the amount of the similarity between LineA and LineB
Change index, specifically include following steps:
Step 1):Each sampled point of broken line LineA, LineB is used TypeNode categorical variable to preserve, and by broken line
Array mode using TypeNode categorical variable is preserved, and that is, the sample point data of LineA is saved in array
In ArrNodesListA, the sample point data of LineB is saved in array ArrNodesListB, and array member variable is designated as
(XCur, yCur);Described TypeNode categorical variable refers to:Member variable is(x,y), and(x,y)In x, y respectively correspondence adopt
The x coordinate value of sampling point and y-coordinate value, x, y are no less than 0 integer;
Step 2):The X-axis of binding analysis LineA, LineB is interval, and two X-axis intervals of LineA, LineB are closed
And, obtain X-axis combine interval [Xmin, Xmax], wherein Xmin is the minimum x of minimum x coordinate value AXmin, LineB of LineA
Minima in coordinate figure BXmin, Xmax is in maximum x coordinate value BXmax of maximum x coordinate value AXmax, LineB of LineA
Maximum;
Step 3):Create new sample point data array NewArrNodesListA, NewArrNodesListB, by array
Length adjustment is Xmax-Xmin+1, and sets all array member variables in two arrays as TypeNode categorical variable
NodeCur, that is,(XCur, yCur), NodeCur is initialized, the x member variable setting NodeCur is as this array member
The corresponding array index of variable, sets the y member variable of NodeCur as 0;
Step 4):Traversal ArrNodesListA, by array member variable(XCur, yCur)Corresponding y member variable
Value, is copied to be designated as under NewArrNodesListA array is corresponding(xCur-Xmin)Array member variable y member variable;
Traversal ArrNodesListB, by array member variable(XCur, yCur)The value of corresponding y member variable, is copied to
It is designated as under NewArrNodesListB array is corresponding(xCur-Xmin)Array member variable y member variable;
Step 5):Create two to be used for preserving variables A mpAcc and the SqrAcc of irrelevance, AmpAcc is used for preserving directly
Deviate aggregate-value, SqrAcc is used for preserving variance deviation aggregate-value, and AmpAcc and SqrAcc is initialized as 0;Create two
For preserving variables A mpAccBase deviateing radix and SqrAccBase, AmpAccBase is used for preserving direct deviation aggregate-value
Benchmark, SqrAccBase is used for preserving variance deviation aggregate-value benchmark, and AmpAccBase and SqrAccBase is initialized as
0;
Step 6):X-axis combine interval in obtaining for step 2, sets section length as SegLen, the length of SegLen
It is no less than 1, be less than Xmax-Xmin+1 simultaneously;
Step 7):By all sampled points of NewArrNodesListA, NewArrNodesListB of obtaining in step 4,
Section length SegLen according to determining in step 6 carries out collecting segmentation, sets and is segmented into Seg_n for n-th, and Seg_n corresponds to X-axis
The X-axis time period of combine interval is that [SegLen*n, SegLen*n+SegLen-1] is interval, for each segmentation Seg_n, is formed
One new sampled point, i.e. the x member variable of the NodeSegC of TypeNode type, NodeSegC is sequence number n of current fragment,
The y member variable of NodeSegC corresponds to NewArrNodesListA, NewArrNodesListB respectively in segmentation Seg_n,
The accumulated value of the y member variable of all sampled points in corresponding X-axis time period interval, the rest may be inferred, finally gives array member
Variable is new sampled point array ArrSegNodesListA and the ArrSegNodesListB of NodeSegC, that is, obtain sampled point
Array be ArrSegNodesListA collect broken line LineSA and sampled point array be ArrSegNodesListB collect folding
Line LineSB;Wherein n is no less than 0 integer, and from the beginning of 0;
Step 8):To ArrSegNodesListA, ArrSegNodesListB, traveled through according to array index, carried out
Hereinafter operate:
a)It is assumed that current fragment is SegC, by ArrSegNodesListA and ArrSegNodesListB in SegC segmentation
The y member variable of two sampled points, acquisition of being subtracted each other and then taken absolute value directly deviates AmpC, carries out power acquisition to AmpC
Variance deviates AmpS;
b)AmpC is added on AmpAcc, realizes AmpAcc accumulative for the direct deviation of all segmentations;
c)AmpS is added on SqrAcc, realizes SqrAcc accumulative for the variance deviation of all segmentations;
d)Extract the absolute value of the value of y member variable of current sampling point of ArrSegNodesListA, and be assigned to
Variables A bsValC, AbsValC is added on AmpAccBase, the power of AbsValC is added on SqrAccBase, obtains
Obtain two divergence indicator benchmark AmpAccBase and SqrAccBase;
Step 9):By AmpAcc, AmpAccBase, SqrAcc and SqrAccBase of obtaining in step 8, using following
Formula can obtain the quantizating index of the similarity between two broken lines:
AmpPer=AmpAcc/AmpAccBase*100%;
SqrPer=SqrAcc/SqrAccBase*100%;
AmpFitPer=100%-AmpPer;
SqrFitPer=100%-SqrPer;
Wherein AmpPer is directly to deviate percentage ratio, and SqrPer is that variance deviates percentage ratio, and AmpFitPer is direct-coupling
Percentage ratio, SqrFitPer is variance coupling percentage.
In the present invention, the subscript of described array is no less than 0 integer, and from the beginning of 0.
Compared with prior art, the invention has the beneficial effects as follows:
The present invention passes through the relative analyses between broken line, the degree of coupling between the broken line that acquisition quantifies, irrelevance index;This
Bright can apply in behavior association analysiss business, can be according to the data distribution characteristic of different behaviors, for through data summarization
The broken line chart obtaining, carries out further comparative analysiss, so obtain quantifiable similarity degree, the assessment of departure degree refers to
Mark;It is, of course, also possible on more fields, for wider analysis purposes.
Brief description
Fig. 1 is that the interval of NewArrNodesListA, NewArrNodesListB merges exemplary plot.
Fig. 2 collects, for LineA, LineB, the exemplary plot obtaining LineSA, LineSB by segmentation.
Fig. 3 is according to the segmentation statistics exemplary plot that directly deviation, variance deviate to LineSA, LineSB.
Specific embodiment
With specific embodiment, the present invention is described in further detail below in conjunction with the accompanying drawings:
A kind of method of broken line disturbed deviation accumulation relative analyses similarity degree, first according to behavior association analysiss business
Require the data distribution characteristic with different behaviors, extract data message in subordinate act association analysiss Service Database, such as analyze
The sales achievement of certain commodity of enterprise, that just can extract not the same kind of goods while section sales volume data;Analysis is daily to sell
Sell the change of section in rush hour, that just can extract one day sales volume data in a period of time of the same kind of goods.Then will carry
The data message taking, automatically generates broken line chart using some softwares, and such software range of choice is a lot, such as excel,
Eviews etc..
Then it is assumed that two broken lines in chart are respectively LineA and LineB, wherein LineA is first to send out broken line, LineB
For rear broken line, the sampled point on broken line(x,y), X-axis is the time, and unit is the second, and Y-axis is quantity, and unit is the time specified
Cycle, such as:1 second, 15 minutes etc., represent within the x time cycle, event frequency is y time, the X-axis of the sampled point of LineA
Interval is [AXmin, AXmax], and the X-axis of the sampled point of LineB is interval to be [BXmin, BXmax].In addition appear below is all
Array index is no less than 0 integer, and from the beginning of 0.Obtain the quantizating index of the similarity between LineA and LineB, concrete bag
Include following steps:
Step 1):Each sampled point of broken line LineA, LineB is used TypeNode categorical variable to preserve, and by broken line
Array mode using TypeNode categorical variable is preserved, and that is, the sample point data of LineA is saved in array
In ArrNodesListA, the sample point data of LineB is saved in array ArrNodesListB, and array member variable is designated as
(XCur, yCur).Refer to array ArrNodesListA, the ArrNodesListB exemplary plot of table 1 below.
Table 1 array ArrNodesListA, ArrNodesListB example
Described TypeNode categorical variable refers to:Member variable is(x,y), and(x,y)In x, y respectively correspond to sampled point
X coordinate value and y-coordinate value, x, y are no less than 0 integer.
Step 2):The X-axis of binding analysis LineA, LineB is interval, and two X-axis intervals of LineA, LineB are closed
And, obtain X-axis combine interval [Xmin, Xmax], wherein Xmin is the minimum x of minimum x coordinate value AXmin, LineB of LineA
Minima in coordinate figure BXmin, Xmax is in maximum x coordinate value BXmax of maximum x coordinate value AXmax, LineB of LineA
Maximum.The interval of NewArrNodesListA, NewArrNodesListB referring to Fig. 1 merges exemplary plot, AXmin=
0, BXmin=0, Xmin=0, AXmax=19, BXmax=21, Xmax=21.
Step 3):Create new sample point data array NewArrNodesListA, NewArrNodesListB, by array
Length adjustment is Xmax-Xmin+1, and sets all array member variables in two arrays as TypeNode categorical variable
NodeCur, that is,(XCur, yCur), NodeCur is initialized, the x member variable setting NodeCur is as this array member
The corresponding array index of variable, sets the y member variable of NodeCur as 0.
Step 4):Traversal ArrNodesListA, by array member variable(XCur, yCur)Corresponding y member variable
Value, is copied to be designated as under NewArrNodesListA array is corresponding(xCur-Xmin)Array member variable y member variable;
Traversal ArrNodesListB, by array member variable(XCur, yCur)The value of corresponding y member variable, is copied to
It is designated as under NewArrNodesListB array is corresponding(xCur-Xmin)Array member variable y member variable.
Step 5):Create two to be used for preserving variables A mpAcc and the SqrAcc of irrelevance, AmpAcc is used for preserving directly
Deviate aggregate-value, SqrAcc is used for preserving variance deviation aggregate-value, and AmpAcc and SqrAcc is initialized as 0;Create two
For preserving variables A mpAccBase deviateing radix and SqrAccBase, AmpAccBase is used for preserving direct deviation aggregate-value
Benchmark, SqrAccBase is used for preserving variance deviation aggregate-value benchmark, and AmpAccBase and SqrAccBase is initialized as
0.
Step 6):X-axis combine interval in obtaining for step 2, sets section length as SegLen, the length of SegLen
It is no less than 1, be less than Xmax-Xmin+1 simultaneously.
Step 7):By all sampled points of NewArrNodesListA, NewArrNodesListB of obtaining in step 4,
Section length SegLen according to determining in step 6 carries out collecting segmentation, sets and is segmented into Seg_n for n-th, and Seg_n corresponds to X-axis
The X-axis time period of combine interval is that [SegLen*n, SegLen*n+SegLen-1] is interval, for each segmentation Seg_n, is formed
One new sampled point, i.e. the x member variable of the NodeSegC of TypeNode type, NodeSegC is sequence number n of current fragment,
The y member variable of NodeSegC corresponds to NewArrNodesListA, NewArrNodesListB respectively in segmentation Seg_n,
The accumulated value of the y member variable of all sampled points in corresponding X-axis time period interval, the rest may be inferred, finally gives array member
Variable is new sampled point array ArrSegNodesListA and the ArrSegNodesListB of NodeSegC, that is, obtain sampled point
Array be ArrSegNodesListA collect broken line LineSA and sampled point array be ArrSegNodesListB collect folding
Line LineSB;Wherein n is no less than 0 integer, and from the beginning of 0.
LineA, LineB of referring to Fig. 2 collect the exemplary plot obtaining LineSA, LineSB, wherein by segmentation
The accumulated value of the y member variable of all sampled points in the segmentation SegC of LineSA, LineSB, as shown in table 2 below.
The accumulated value of the y member variable of all sampled points in the segmentation SegC of table 2LineSA, LineSB
Step 8):To ArrSegNodesListA, ArrSegNodesListB, traveled through according to array index, carried out
Hereinafter operate:
a)It is assumed that current fragment is SegC, by ArrSegNodesListA and ArrSegNodesListB in SegC segmentation
The y member variable of two sampled points, acquisition of being subtracted each other and then taken absolute value directly deviates AmpC, carries out power acquisition to AmpC
Variance deviates AmpS;
d)AmpC is added on AmpAcc, realizes AmpAcc accumulative for the direct deviation of all segmentations;
e)AmpS is added on SqrAcc, realizes SqrAcc accumulative for the variance deviation of all segmentations;
d)Extract the absolute value of the value of y member variable of current sampling point of ArrSegNodesListA, and be assigned to
Variables A bsValC, AbsValC is added on AmpAccBase, the power of AbsValC is added on SqrAccBase, obtains
Obtain two divergence indicator benchmark AmpAccBase and SqrAccBase.
When AmpAcc is equal to 0 with SqrAcc, the degree of coupling is 100%.When the value of AmpAcc and SqrAcc is bigger, explanation
The degree of coupling is lower.
Step 9):By AmpAcc, AmpAccBase, SqrAcc and SqrAccBase of obtaining in step 8, using following
Formula can obtain the quantizating index of the similarity between two broken lines:
AmpPer=AmpAcc/AmpAccBase*100%;
SqrPer=SqrAcc/SqrAccBase*100%;
AmpFitPer=100%-AmpPer;
SqrFitPer=100%-SqrPer;
Two of which irrelevance index:AmpPer is directly to deviate percentage ratio, and SqrPer is that variance deviates percentage ratio;Two
Overlapping index:AmpFitPer is direct-coupling percentage ratio, and SqrFitPer is variance coupling percentage.
Irrelevance refers to that target value is bigger, illustrates that the similarity between broken line is poorer;The value of overlapping index is less, illustrates to roll over
Similarity between line is poorer, and the value of overlapping index may be negative;When two irrelevance indexs are all 0%, two couplings
Degree index is all 100%, illustrates that the similarity between broken line is 100%.And then obtained required in behavior association analysiss business
The analysis result of data.
Refer to the exemplary plot that Fig. 3 directly deviates according to segmentation statistics, variance deviates to LineSA, LineSB, in Fig. 3
Each data is as shown in table 3 below.
Table 3LineSA, LineSB segmentation statistical data
Finally it should be noted that listed above be only the present invention specific embodiment.It is clear that the invention is not restricted to
Above example, can also have many variations.Those of ordinary skill in the art directly can lead from present disclosure
The all deformation going out or associating, are all considered as protection scope of the present invention.