CN103473458B

CN103473458B - Method for comparatively analyzing similarities of fold lines

Info

Publication number: CN103473458B
Application number: CN201310420053.4A
Authority: CN
Inventors: 王锦龙; 范渊; 杨永清
Original assignee: DBAPPSecurity Co Ltd
Current assignee: Hangzhou Anheng Information Security Technology Co Ltd
Priority date: 2013-09-13
Filing date: 2013-09-13
Publication date: 2017-02-08
Anticipated expiration: 2033-09-13
Also published as: CN103473458A

Abstract

The invention relates to a method for comparatively analyzing similarities of fold lines. Deviation and coupling degrees are indicated through quantitative indexes, and the similarities of a plurality of fold lines can be compared through the quantitative indexes. The method is mainly characterized in that direct deviation and square deviation can be accumulated through summarized fold lines obtained by sectional summarizing for X axes of the fold lines and are compared with reference values of direct deviation and square deviation to obtain percent quantitative indexes, so that the similarities of the fold lines are indicated through the quantitative indexes.

Description

A kind of method of broken line relative analyses similarity

Technical field

The present invention relates to a kind of computer software, especially relate to similar with regard to two broken lines in a kind of computer software The control methods of degree.

Background technology

During some are to the excavation of data, analysis, the plane coordinates constituting some discrete data is needed to click through Row analysis, thus draw the mutual relation between these plane coordinates points.If it is flat that the two groups of data that there is precedence relationship are constituted Areal coordinate point constitutes two two broken lines having precedence relationship, to analyze the similarity of this two broken lines by computer, It is the important and effective step of the degree of association cracking this two groups of data.But it is in existing software, similar to two broken lines The judgement of degree is very rough, and its reliability is not high.

Content of the invention

The similarity that the present invention is mainly two broken lines of judgement existing for solution prior art is very rough, and reliability is not High technical problem, provide one kind can digitization, and accurately, reliably judge two broken lines degree of accuracy broken line to score The method of analysis similarity.

1. the above-mentioned technical problem of the present invention is mainly addressed by following technical proposals：A kind of broken line is to score The method of analysis similarity, it includes initial A coordinate memory module, initial B coordinate memory module, merges index module, calculates mould Block is it is characterised in that it comprises the steps：

1st, store broken line LineA coordinate in initial A coordinate memory module, store folding in initial B coordinate memory module The coordinate of line LineB, AXmin, BXmin respectively interval minima of the interval X-axis with LineB of the X-axis of LineA, AXmax, BXma x is respectively the maximum in the interval X-axis interval with LineB of X-axis of LineA, by interval for the X-axis of the LineA X with LineB Axle interval all inputs merging index module, obtains X-axis combine interval [Xmin, Xmax], and wherein Xmin is AXmin, in BXmin Minima, wherein Xmax be AXmax, the maximum in BXmax；

2nd, on the time period that LineA is not had sampled point on combine interval, it is that LineA generates new sampled point, will LineB does not have the time period of sampled point on combine interval（X-axis unit）On, it is that LineB generates new sampled point；

3rd, it is provided with two positions in computing module and is used for preserving irrelevance, respectively directly deviate aggregate-value position AmpAcc, variance deviate aggregate-value position SqrAcc and AmpAcc and SqrAcc are initialized as 0, are additionally provided with computing module Two positions are used for preserving deviates radix, and respectively directly deviation aggregate-value benchmark position AmpAccBase, variance deviation are accumulative Value benchmark position SqrAccBase, AmpAccBase and SqrAccBase is also initialized as 0；

4th, to combine interval [Xmin, Xmax] located at section length SegLen, the length of SegLen is 1 X-axis time period Multiple, LineA, all sampled points of LineB carry out collecting segmentation according to SegLen, if N be natural number, n-th The X-axis time period of the corresponding X-axis combine interval of segmentation SegN is that [N, N+SegLen] is interval, when the Y-axis value of SegN is corresponding Between section the Y-axis value of sampled point cumulative and, finally give two and new collect broken line LineSA, LineSB；

5th, according to segmentation, from first segmentation Seg1 to last segmentation, traveled through：

(1) for current fragment SegC, LineSA and LineSB is subtracted each other and then taken in the Y-axis value of current fragment Absolute value obtains and directly deviates AmpC, AmpC is carried out with power and obtains variance deviation AmpS；

(2) AmpC is added on AmpAcc, realizes AmpAcc accumulative for the direct deviation of all segmentations；

(3) AmpS is added on SqrAcc, realizes SqrAcc accumulative for the variance deviation of all segmentations；

(4) absolute value of the current value of LineSA is added on AmpAccBase, by the current value of LineSA The power of absolute value is added on SqrAccBase, obtains two divergence indicator benchmark；

The AmpAcc that traversal obtains after terminating represents the accumulative of the direct bias of all segmentations, and SqrAcc represents The deviation power of all segmentations accumulative；

6th, two irrelevance indexs are obtained：

Directly deviate percentage ratio（AmpPer）：AmpPer = AmpAcc / AmpAccBase * 100%；

Variance deviates percentage ratio（SqrPer）：SqrPer = SqrAcc / SqrAccBase * 100%；

7th, two overlapping indexes are obtained：

Direct-coupling percentage ratio（AmpFitPer）：AmpFitPer = 100% - AmpPer；

Variance coupling percentage（SqrFitPer）：SqrFitPer = 100% - SqrPer.

The present invention is the value of the X-axis of each sampled point, Y-axis in a kind of broken line, in conjunction with the permission distribution of the X-axis setting Deviate window, obtain the degree of coupling, the irrelevance index quantifying, and then can be the broken line specified, from specified broken line set, seek Find coupling degree of coupling highest, the broken line of irrelevance minimum, thus obtaining the data analysis system of best match broken line pair.

Assume：There are two clear and definite broken line LineA, LineB first sending out, sending out afterwards relation, wherein LineA is first to send out folding Line, LineB is rear broken line, and the X-axis of the sampled point of LineA is interval to be [AXmin, AXmax], the X-axis area of the sampled point of LineB Between be [BXmin, BXmax].

Broken line：X-axis has unit, and each unit has a sampled point, and each sampled point has value in Y-axis.One As common application scenarios be：（1）X-axis is the time, and unit is the second；（2）Y-axis is quantity, and unit is secondary；（3）One sampled point （x,y）Represent in time period second time x（More than or equal to x second time point, less than x+1 second time point）Interior, there is certain event altogether Meter y time；

First send out broken line, send out broken line afterwards：First send out broken line, represent the corresponding event of this broken line it should occur front.After send out folding Line, represents the corresponding event of this broken line it should occur rear.

Coupling：For first sending out certain event of broken line, from all events of rear broken line, deviate window according to allowing distribution Rule, finds an event and is matched；First send out certain particular event of broken line, at most can only be with one of rear broken line thing Part is matched；After send out one of broken line event, at most can only first be sent out one of broken line event and be matched.

Distribution is allowed to deviate window：It is assumed that it is N that distribution deviates window size, between two broken lines LineA, LineB, Carry out during Coupling Degrees it is allowed to first send out certain sampled point of broken line（Ax,Ay）Corresponding Ay event, in LineB（N+ 1）Individual sampled point carries out coupling association, and in LineB, corresponding time segment limit is x, x+1, x+2 ..., x+N.

The degree of coupling：Between two broken lines LineA, LineB, if two broken lines are completely superposed（The value phase of sampled point With）, the degree of coupling of this kind of situation is necessarily unity couping；If the corresponding each event of each sampled point in LineA, can Allowing distribution to deviate the corresponding event several times of sampled point in the range of the time period of the corresponding LineB of window, obtaining unique The corresponding event of coupling, and each event in each sampled point in final LineB, have all been matched correspondence, then two The degree of coupling between bar broken line is unity couping；During unity couping, overlapping index should reach highest.

Irrelevance：The order of severity being the failure to enough acquisition pairings of irrelevance explanation, during unity couping, irrelevance index should For 0 it is impossible to mate is more, irrelevance index should be higher.

Directly deviate：Correspondence collects the same time period of broken line, the absolute value of the difference of Y-axis value of two sampled points.

Variance deviates：Correspondence collects the same time period of broken line, its power directly deviateing.

This method be intended to by broken line between relative analyses, obtain quantify broken line between the degree of coupling, irrelevance index, Thus for different broken lines between similarity system design comparative approach is provided, and then for the broken line specified from several broken lines, Search out the broken line of best match.Irrelevance refers to that target value is bigger, illustrates that the similarity between broken line is poorer；Overlapping index Value（It may be negative）Less, illustrate that the similarity of this part of broken line is poorer；When two irrelevance indexs are all 0%, two degrees of coupling Index is all 100%, illustrates that the similarity between broken line is 100%.

Preferably, the length of described SegLen can 1 X-axis time period to 20% X-axis combine interval [Xmin, Xmax] length range in selected.The program can ensure that SegLen has suitable length, can ensure that suitably many again Sampled point.

The beneficial effect brought of the present invention is, the similarity solving two broken lines of judgement existing for prior art is non- Often rough, the not high technical problem of reliability it is achieved that one kind can digitization, and accurately, reliably judge two broken lines The method of the broken line relative analyses similarity of degree of accuracy.

Brief description

Accompanying drawing 1 is the schematic diagram of LineA, LineB of the present invention；

Accompanying drawing 2 is the schematic diagram of LineSA, LineSB of the present invention；

Accompanying drawing 3 is according to the segmentation statistics schematic diagram that directly deviation, variance deviate to LineSA, LineSB.

Specific embodiment

Below by embodiment, and combine accompanying drawing, technical scheme is described in further detail.

Embodiment：

As shown in Figure 1, Figure 2, Figure 3 shows, the present invention is a kind of method of broken line relative analyses similarity, and it includes initially A coordinate memory module, initial B coordinate memory module, merging index module, computing module, it comprises the steps：

4th, to combine interval [Xmin, Xmax] located at section length SegLen, the length of SegLen can be 1 X-axis time Section is selected, by LineA, all samplings of LineB in the length range of 20% X-axis combine interval [Xmin, Xmax] Point, carries out collecting segmentation according to SegLen, if N is natural number, the X-axis time of the corresponding X-axis combine interval of n-th segmentation SegN Section is interval for [N, N+SegLen], the Y-axis value of SegN be the cumulative of the Y-axis value of the sampled point of corresponding time period and, Obtain two eventually and new collect broken line LineSA, LineSB；

6th, two irrelevance indexs are obtained：

Directly deviate percentage ratio AmpPer：AmpPer = AmpAcc / AmpAccBase * 100%；

Variance deviates percentage ratio SqrPer：SqrPer = SqrAcc / SqrAccBase * 100%；

7th, two overlapping indexes are obtained：

Direct-coupling percentage ratio AmpFitPer：AmpFitPer = 100% - AmpPer；

Variance coupling percentage SqrFitPer：SqrFitPer = 100% - SqrPer.

Claims

1. a kind of method of broken line relative analyses similarity, it include initial A coordinate memory module, initial B coordinate memory module, Merge index module, computing module it is characterised in that it comprises the steps：

（1）Store broken line LineA coordinate in initial A coordinate memory module, store broken line in initial B coordinate memory module The coordinate of LineB, AXmin, BXmin respectively interval minima of the interval X-axis with LineB of the X-axis of LineA, AXmax, BXma x is respectively the maximum in the interval X-axis interval with LineB of X-axis of LineA, by interval for the X-axis of the LineA X with LineB Axle interval all inputs merging index module, obtains X-axis combine interval [Xmin, Xmax], and wherein Xmin is AXmin, in BXmin Minima, wherein Xmax be AXmax, the maximum in BXmax；

（2）On the time period that LineA is not had sampled point on combine interval, it is that LineA generates new sampled point, by LineB On the time period not having sampled point on combine interval, it is that LineB generates new sampled point；

（3）Be provided with two positions and be used for preserving irrelevance in computing module, respectively directly deviate aggregate-value position AmpAcc, Variance deviates aggregate-value position SqrAcc and AmpAcc and SqrAcc is initialized as 0, is additionally provided with two positions in computing module Deviate radix for preserving, respectively directly deviate aggregate-value benchmark position AmpAccBase, variance deviates aggregate-value reference region Position SqrAccBase, AmpAccBase and SqrAccBase is also initialized as 0；

（4）Combine interval [Xmin, Xmax] is provided with section length SegLen, the length of SegLen be 1 X-axis time period times Number, LineA, all sampled points of LineB carry out collecting segmentation according to SegLen, if N is natural number, n-th segmentation The X-axis time period of the corresponding X-axis combine interval of SegN is that [N, N+SegLen] is interval, and the Y-axis value of SegN is the corresponding time period The Y-axis value of sampled point cumulative and, finally give two and new collect broken line LineSA, LineSB；

（5）According to segmentation, from first segmentation Seg1 to last segmentation, traveled through：

1）For current fragment SegC, LineSA and LineSB is subtracted each other and then taken definitely in the Y-axis value of current fragment Value obtains and directly deviates AmpC, AmpC is carried out with power and obtains variance deviation AmpS；

2）AmpC is added on AmpAcc, realizes AmpAcc accumulative for the direct deviation of all segmentations；

3）AmpS is added on SqrAcc, realizes SqrAcc accumulative for the variance deviation of all segmentations；

4）The absolute value of the current value of LineSA is added on AmpAccBase, by the absolute value of the current value of LineSA Power be added on SqrAccBase, obtain two divergence indicator benchmark；

Traversal terminate after the AmpAcc that obtains represent all segmentations direct bias accumulative, and SqrAcc represent all The deviation power of segmentation accumulative；

（6）Obtain two irrelevance indexs：

（7）Obtain two overlapping indexes：

Direct-coupling percentage ratio AmpFitPer：AmpFitPer = 100% - AmpPer；

Variance coupling percentage SqrFitPer：SqrFitPer = 100% - SqrPer.

2. a kind of method of broken line relative analyses similarity according to claim 1 is it is characterised in that described SegLen Length can be selected in the length range of the X-axis combine interval [Xmin, Xmax] of 1 X-axis time period to 20%.