CN111079089B - Base station data anomaly detection method based on interval division - Google Patents

Base station data anomaly detection method based on interval division Download PDF

Info

Publication number
CN111079089B
CN111079089B CN201911329988.5A CN201911329988A CN111079089B CN 111079089 B CN111079089 B CN 111079089B CN 201911329988 A CN201911329988 A CN 201911329988A CN 111079089 B CN111079089 B CN 111079089B
Authority
CN
China
Prior art keywords
interval
data
distance
window
static
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911329988.5A
Other languages
Chinese (zh)
Other versions
CN111079089A (en
Inventor
刘海波
廖闻剑
卢山
张俊杰
张坤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Fiberhome Telecommunication Technologies Co ltd
Original Assignee
Nanjing Fiberhome Telecommunication Technologies Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Fiberhome Telecommunication Technologies Co ltd filed Critical Nanjing Fiberhome Telecommunication Technologies Co ltd
Priority to CN201911329988.5A priority Critical patent/CN111079089B/en
Publication of CN111079089A publication Critical patent/CN111079089A/en
Application granted granted Critical
Publication of CN111079089B publication Critical patent/CN111079089B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Computational Mathematics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Operations Research (AREA)
  • Probability & Statistics with Applications (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Algebra (AREA)
  • Remote Sensing (AREA)
  • Software Systems (AREA)
  • Complex Calculations (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a base station data anomaly detection method based on interval division, which comprises the following steps: preprocessing an original track data set, and dividing the processed data set into a dynamic interval and a static interval; the dynamic interval is represented as a range formed by subscripts of any plurality of continuous adjacent isolated points, and the static interval is represented as a range formed by start and stop subscripts of all the rest data fragments in the original data set; extracting abnormal points from the dynamic interval by using a multidimensional Gaussian model and a sliding window distance model; extracting abnormal points from the static interval by using a gravity center distance scoring method; a five-tuple is used for representing the dynamic abnormal point and the static abnormal point, and a five-tuple set is used for representing the abnormal point set. The method disclosed by the invention is suitable for processing online data, has short time and high accuracy, can effectively evaluate a new abnormal mode, and has low misjudgment rate.

Description

Base station data anomaly detection method based on interval division
Technical Field
The invention discloses a base station data anomaly detection method based on interval division, and relates to the field of data mining in an artificial intelligent computer, in particular to the technical field of space-time track data anomaly detection.
Background
With the vigorous development of positioning technology and pervasive computing, daily behavior data of people are collected in various modes, and big track data are generated. The track big data is represented as a large-scale high-speed space-time data stream generated by the positioning equipment, the track big data appearing in a data stream form is effectively analyzed and processed, and abnormal phenomena hidden in the track data can be found, so that the application of city planning, safety management and control and the like is served.
Existing trace data anomaly detection techniques include classification-based detection, historical data similarity-based detection, distance-based detection, cluster-based detection, and the like. These methods suffer from the following disadvantages:
1. anomalies in the track stream data are unknown, time-varying, and are not suitable for processing online data based on classification;
2. the distance-based method relates to neighbor query and distance calculation of a large amount of track data, and has the advantages of high time overhead and low accuracy;
3. based on the method of the historical data, depending on a large amount of historical data, a new abnormal mode cannot be effectively evaluated;
4. the clustering-based method has high selection requirements on features and class clusters, and generally has high misjudgment rate.
Disclosure of Invention
The technical problems to be solved by the invention are as follows: aiming at the defects of the prior art, the base station data anomaly detection method based on interval division is provided, an original data set is divided into a plurality of subsets according to the characteristics of base station acquisition data, and then different models are adopted for solving the subsets of different types. Finally, an outlier candidate set is obtained.
The invention adopts the following technical scheme for solving the technical problems:
a base station data anomaly detection method based on interval division, the method comprising the steps of:
preprocessing an original track data set, and dividing the processed data set into a dynamic interval and a static interval; the dynamic interval is represented as a range formed by subscripts of any plurality of continuous adjacent isolated points, and the static interval is represented as a range formed by start-stop subscripts of all the remaining data fragments in the original data set;
step (2), model solving, namely extracting abnormal points of the dynamic interval by using a multidimensional Gaussian model and a sliding window distance model; extracting abnormal points from the static interval by using a gravity center distance scoring method;
and (3) using the five-tuple to represent the dynamic abnormal point and the static abnormal point to form a five-tuple set to represent the abnormal point set.
As a further preferred embodiment of the present invention, the rule of the preprocessing in the step (1) is: the cleaning data does not contain data of a preset field; the cleaned data is de-duplicated and time ordered.
As a further preferable aspect of the present invention, in step (1), the method for dividing the original trajectory data set into intervals by using a dynamic interval search algorithm includes the steps of:
101. the isolated point is selected, so that data which only appears once in a specified time range is used as the isolated point, and the expression formula is as follows:
wherein ,lt =(lon t ,lat t ) The spatial position at a certain time t is represented, and consists of longitude lon and latitude lat at the time,expressed in terms of time t i A time segment of the center moment;
if it isL is then t Is an isolated point;
102. dynamic interval search, in which a range formed by start and stop subscripts of any plurality of continuous neighboring isolated points is set is called a dynamic interval:
the above represents two isolated points l x ,l y Of (c), wherein index (l) t ) Representing isolated points l t Index subscript in original dataset, then l x ,l y Neighbor if and only if
For a set of multiple isolated points, l= { L 1 ,l 2 ,l 3 …l i }, any subset ofIf it isThen L is referred to as the i-neighbor isolated point set;
the range of subscript composition of start-stop elements of neighboring isolated point sets is a dynamic interval, denoted as i= [ index (l) 1 ),index(l i )];
103. Generating a static interval, removing all dynamic intervals in the subscript range of the preprocessing result set, and enabling all the rest intervals to be called static intervals;
let the original dataset subscript interval s= [0, n]Assume a dynamic interval I 1 =[i,i+k],I 2 =[j,j+u]Where k, u > 0, i > 0, J > i+k, j+u < n, interval J 1 =[0,i-1],J 2 =[i+k+1,j-1],J 3 =[j+u+1,n]Referred to as a static interval.
As a further preferable aspect of the present invention, in the step (2), the model solution of the dynamic section includes the steps of:
201. the longitude, latitude, extraction time and position switching rate of the extracted data sample are substituted into a Gaussian model to calculate probability density of each item of data in the whole data set, probability values are ordered from small to large, data corresponding to the first lambda probability values are selected to be added into an abnormal point candidate set E 1 The calculation formula of the multidimensional Gaussian model is as follows:
wherein μ is an N-dimensional mean vector, Σ is an n×n covariance matrix, and Σ is a determinant of Σ;
202. establishing a sliding window distance model, and selecting any continuous data W=w with the size of 2k+1 from a preprocessing result set i-k ,…,w i-1 ,w i ,w i+1 ,…,w i+k As a window, itW in i Is the center of window W, W up =w i-k ,…,w i-1 Represent the upper half window of length k, w down =w i+1 ,…,w i+k The lower half window of length k is shown. Let R (w) i ,w up ) Representing the center point w i And the upper half window w up Is expressed as:
wherein distance (w i ,w i-1 ) Representing the window center w i And the above information w i-1 Is used for the distance of the Europe type (R),representing the upper half window w up A maximum value of the distance between any two positions;
then window center w i And the upper half window w up Correlation if and only if R (w i ,w up )=1;
Let R (w) i ,w down ) Representing the center point w i And lower half window w down Is expressed as:
wherein distance (w i ,w i+1 ) Representing the window center w i And the following information w i+1 Is used for the distance of the Europe type (R),representing the lower half window w down A maximum value of the distance between any two positions;
then window center w i And lower half window w down Correlation if and only if R (w i ,w down )=1;
Converting the process of searching abnormal points on the preprocessing result set into the process of translating the window W with a fixed Step length Step to search for coincidenceCondition R (w) i ,w up )=0∩R(w i ,w dow n) =0, adding the window center point to the outlier candidate set E 2
As a further preferable embodiment of the present invention, in the step (2), the outlier solving for the static section using the barycenter distance scoring method includes the steps of:
203. center of gravity point selection, let M represent the set of all data in the static interval J, then L' = { l|l ε M, freq M (l) The > gamma } represents position data in the set M with occurrence frequency greater than the threshold gamma, wherein freq M (l) The frequency of the occurrence of the position l in the set M is represented by calculating the interval gravity center point O by a weighted average method, which is represented as:
wherein ,representing weights +.>For position l i Longitude of->For position l i N is the number of elements in L';
204. distance score calculation to distance (l) x ,l y ) When the distance between any two positions is expressed, the maximum value of the distance between any element in the set L and the center of gravity is referred to as the distance radius, and is expressed as
Further score for arbitrary data M in collection M m Expressed as:
then the static interval outlier candidate set E 3 ={m|m∈M,score m =1}。
As a further preferable aspect of the present invention, the step (3) specifically includes the steps of:
301. abnormal point candidate set E obtained by solving dynamic interval 1 and E2 Making intersections, wherein the same elements are extracted as outliers;
302. abnormal point candidate set E obtained by solving static interval 3 The medium element is an abnormal point;
303. five-tuple error= [ Account, lon, lat, cptime, errFlag ] is defined to represent the above-mentioned extracted outlier, where ErrFlag represents outlier type, errflag=0 represents dynamic outlier, and errflag=1 represents static outlier.
Compared with the prior art, the technical scheme provided by the invention has the following technical effects: the method disclosed by the invention is suitable for processing online data, has short time and high accuracy, can effectively evaluate a new abnormal mode, and has low misjudgment rate.
Drawings
Fig. 1 is an overall flow chart of the present invention.
Fig. 2 is a schematic diagram of interval division in the present invention.
FIG. 3 is a schematic view of a sliding window distance model according to the present invention.
FIG. 4 is a schematic diagram of the center of gravity distance scoring method according to the present invention.
FIG. 5 is a schematic diagram of the original trajectory of experimental data in an embodiment of the present invention.
Fig. 6 is a schematic diagram of static interval 1 of experimental data in an embodiment of the present invention.
FIG. 7 is a schematic diagram of dynamic intervals of experimental data in an embodiment of the present invention.
FIG. 8 is a schematic diagram of static interval 2 of experimental data in an embodiment of the present invention.
Detailed Description
Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein the same or similar reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the drawings are exemplary only for explaining the present invention and are not to be construed as limiting the present invention.
The technical scheme of the invention is further described in detail below with reference to the accompanying drawings:
the invention discloses a base station data anomaly detection method based on interval division, the whole flow chart of which is shown in figure 1, comprising the following steps:
step (1), preprocessing an original track data set, and dividing the processed data set into a dynamic interval and a static interval, wherein the dynamic interval is represented as a range formed by subscripts of any plurality of continuous adjacent isolated points, the static interval is represented as a range formed by start and stop subscripts of all the remaining data fragments in the original track data set, and the interval division schematic diagram is shown in fig. 2.
The rules of the preprocessing in the step (1) are as follows: the cleaning data does not contain data of fields such as longitude, latitude, time and the like; the cleaned data is de-duplicated and time ordered.
Further, the method for dividing the interval of the original track data set by utilizing a dynamic interval searching algorithm comprises the following steps:
1. and selecting the isolated point, and enabling the data which only appears once in the appointed time range to serve as the isolated point. The formula is as follows:
wherein ,lt =(lont,lat t ) The spatial position at a certain time t is represented, and consists of longitude lon and latitude lat at the time,expressed in terms of time t i At the center momentIs a time slice of (a). If->L is then t Is an isolated point.
2. The dynamic interval search is performed, and a range formed by start and stop subscripts of any plurality of continuous adjacent isolated points is called a dynamic interval. Order the
The above represents two isolated points l x ,l y Of (c), wherein index (l) t ) Representing isolated points l t Index subscript in original dataset, then l x ,l y Neighbor if and only ifFor a set of multiple isolated points, l= { L 1 ,l 2 ,l 3 …l i -arbitrary subset->If->Then L is referred to as the set of i-neighbor isolated points. The range of subscript composition of start-stop elements of neighboring isolated point sets is a dynamic interval, denoted as i= [ index (l) 1 ),index(l i )]。
3. And generating a static interval. All dynamic intervals are removed within the subscript range of the preprocessing result set, and all the remaining intervals are called static intervals. Let the original dataset subscript interval s= [0, n]Assume a dynamic interval I 1 =[i,i+k],I 2 =[j,j+u]Where k, u > 0, i > 0, j > i+k, j+u < n. Interval J 1 =[0,i-1],J 2 =[i+k+1,j-1],J 3 =[j+u+1,n]Referred to as a static interval.
Step (2), model solving, namely extracting abnormal points of the dynamic interval by using a multidimensional Gaussian model and a sliding window distance model; and extracting abnormal points from the static interval by using a gravity center distance scoring method, wherein a sliding window distance model schematic diagram is shown in fig. 3, and a gravity center distance scoring method schematic diagram is shown in fig. 4.
Further, the model solving of the dynamic interval in the step (2) includes the following steps:
1. the longitude, latitude, extraction time and position switching rate of the extracted data sample are substituted into a Gaussian model to calculate probability density of each item of data in the whole data set, probability values are ordered from small to large, data corresponding to the first lambda probability values are selected to be added into an abnormal point candidate set E 1 The calculation formula of the multidimensional Gaussian model is as follows:
where μ is an N-dimensional mean vector, Σ is an n×n covariance matrix, and Σ is a determinant of Σ.
2. A sliding window distance model. Selecting any continuous data W=w with the size of 2k+1 from the preprocessing result set i-k ,…,w i-1 ,w i ,w i+1 ,…,w i+k As a window, where w i Is the center of window W, W up =w i+k ,…,w i-1 Represent the upper half window of length k, w down =w i+1 ,…,w i+k The lower half window of length k is shown. Let R (w) i ,w up ) Representing the center point w i And the upper half window w up Is expressed as:
wherein distance (w) i ,w i-1 ) Representing the window center w i And the above information w i-1 Is used for the distance of the Europe type (R),representing the upper half window w up A maximum value of the distance between any two positions. Then window center w i And the upper half window w up Correlation if and only if R (w i ,w up ) =1. Similarly, let R (w i ,w down ) Representing the center point w i And lower half window w down Is expressed as:
wherein distance (w) i ,w i+1 ) Representing the window center w i And the following information w i+1 Is used for the distance of the Europe type (R),representing the lower half window w down A maximum value of the distance between any two positions. Then window center w i And lower half window w down Correlation if and only if R (w i ,w down )=1。
The process of finding outliers on the pre-processing result set can therefore be converted into translating the window W by a fixed Step, finding a match to the condition R (W i ,w up )=0∩R(w i ,w down ) Window center procedure of=0. Adding the window center point into the outlier candidate set E 2
Further, the outlier solving method for the static interval by using the gravity center distance scoring method comprises the following steps:
3. and selecting a gravity center point. Let M denote the set of all data within the static interval J. Then L' = { l|l e M, freq M (l) The > gamma } represents position data in the set M with occurrence frequency greater than the threshold gamma, wherein freq M (I) Representing how frequently position l occurs in set M. The center of gravity point O of the interval is calculated by adopting a weighted average mode and is expressed as
wherein Representing weights +.>For position l i Longitude of->For position l i N is the number of elements in L'.
4. And calculating a distance score. Distance (l) x ,l y ) Representing the distance between any two locations. The maximum value of the distance between any element in the set L and the center of gravity is called the distance radius, expressed asFurther score for arbitrary data M in collection M m Expressed as:
then the static interval outlier candidate set E 3 ={m|m∈M,score m =1}。
And (3) using the five-tuple to represent the dynamic abnormal point and the static abnormal point to form a five-tuple set to represent the abnormal point set. The method specifically comprises the following steps:
1. abnormal point candidate set E obtained by solving dynamic interval 1 and E2 Making intersections, wherein the same elements are extracted as outliers;
2. abnormal point candidate set E obtained by solving static interval 3 The medium element is an abnormal point;
3. five-tuple error= [ Account, lon, lat, cptime, errFlag ] is defined to represent the above-mentioned extracted outlier, where ErrFlag represents outlier type, errflag=0 represents dynamic outlier, and errflag=1 represents static outlier.
The following detailed description of the embodiments of the invention refers to the accompanying drawings and tables.
Take part of the experimental data in table 1 as an example:
table 1: part of the experimental data
The invention relates to a base station data anomaly detection method based on interval division, which comprises the following steps:
1. pretreatment of
And removing the data which does not contain fields such as account numbers, longitudes, latitudes, extraction time and the like, and sorting the data conforming to the rules according to time.
2. Section division
1) And selecting isolated points. In the present invention, if it is satisfied thatL is then t Is an isolated point. Wherein T is t For a time period of 2 hours back and forth. The data numbered 20,34-39,72 in Table 1 were selected as outliers because they appeared only once within 2 hours of each other.
2) Dynamic interval searching. In the present invention, a range formed by start and stop subscripts of any plurality of continuous neighboring isolated points is referred to as a dynamic range. Where the neighbor relation threshold μ=10. In table 1, no. 20 and No. 34 are spaced 14 apart, and the neighbor condition is not satisfied; neither the interval 33 between the numbers 39 and 72 satisfies the condition. While 34-39 satisfies the neighbor condition and 34-39 may constitute a 6-neighbor isolated point set. Thus [34,39] is a dynamic interval.
3) And generating a static interval. Within the subscript range, all dynamic intervals are removed, and all remaining intervals are referred to as static intervals. I.e., static intervals of [1,33], [40,73].
Therefore, the method divides the original data set into a plurality of dynamic intervals and a plurality of static intervals to be solved respectively. As shown in fig. 5.
3. Model solving
1) Gaussian model+sliding window distance model
The anomaly judgment of each point in the dynamic interval depends on the context, so that when the experiment solves the dynamic interval, 5 position data are respectively blurred upwards and downwards for auxiliary calculation.
First, choose longitude, latitude, time, and location switching rate as 4 latitude pairs of multi-dimensional Gaussian [29,44 ]]Calculating probability density, sorting probability values from small to large, selecting data 36,37,38 corresponding to the first lambda=3 probability values, and adding the data into the abnormal point candidate set E 1
Next, using a sliding window of size 2k+1 (k=5), the window center is moved from 34 to 39, and the degree of association of each point with the context in the [34,39] interval is determined, where the association threshold δ=2.
Take the example of number 37. Suppose that the window center moves to 37. Firstly, calculating the maximum value of the pairwise Euclidean distance of the upper half window 32-36 as 0.37251550801928224, and the maximum value of the pairwise Euclidean distance of the lower half window 40-44 as 0.23724898399045094; the point 37 is located 1.3443963712052325 from the upper portion 36 and 2.0689669522578904 from the lower portion 38. It can be found that the correlation of the number 37 with the upper half window and the lower half window is insufficient as shown in fig. 7. Finally, 37 is added to the outlier candidate set E 2
2) Barycentric distance scoring
The invention proposes to solve abnormal points in a static interval by using a gravity center distance scoring algorithm. The method comprises the following steps:
and selecting a gravity center point. First, the frequency of each position point in the static interval is calculated. If the frequency is greater than threshold 2. The position is taken as an influencing factor for the centre of gravity. As shown in fig. 6, a static section 1 is taken as an example. The frequency of each point in the interval is as follows:
therefore, the center of gravity is affected by the positions 1 to 5, and the center of gravity O (120.01242,30.28419) can be obtained by using a weighted average method. The distances between the numbers 1 to 5 and the gravity center O are respectively calculated, and the maximum value is taken as the radius r. The distance O between position No. 2 and the position furthest from position No. 2 is about 1509 meters. And drawing a circle by taking O as a circle center and r as a radius, wherein points outside the circle are abnormal points.
Similarly, as shown in fig. 8, the data of the number 72 in the static section 2 is abnormal, and is added to the static section abnormal point candidate set E.
4. Five-tuple represents an abnormal point set
The representation of outliers comprises the steps of:
1) Abnormal point candidate set E obtained by solving dynamic interval 1 and E2 Making intersections, wherein the same elements are extracted as outliers;
2) Abnormal point candidate set E obtained by solving static interval 3 The medium element is an abnormal point;
3) Five-tuple error= [ Account, lon, lat, cptime, errFlag ] is defined to represent the outlier extracted above, where ErrFlag represents outlier type, errFlag=0 represents dynamic outlier, errFlag=1 represents static outlier
Thus, the final set of outliers is expressed as:
[136****9106,120.24317,30.27825,1520317737,0]
[136****9106,120.42331,30.21835,1520333487,1]。
the embodiments of the present invention have been described in detail with reference to the drawings, but the present invention is not limited to the above embodiments, and various changes can be made within the knowledge of those skilled in the art without departing from the spirit of the present invention. The present invention is not limited to the preferred embodiments, but is capable of modification and variation in detail, and other embodiments, such as those described above, of making various modifications and equivalents will fall within the spirit and scope of the present invention.

Claims (2)

1. A base station data anomaly detection method based on interval division, the method comprising the steps of:
preprocessing an original track data set, and dividing the processed data set into a dynamic interval and a static interval; the dynamic interval is represented as a range formed by subscripts of any plurality of continuous adjacent isolated points, and the static interval is represented as a range formed by start-stop subscripts of all the remaining data fragments in the original data set;
comprising the following steps:
101. the isolated point is selected, so that data which only appears once in a specified time range is used as the isolated point, and the expression formula is as follows:
wherein ,lt =(lon t ,lat t ) The spatial position at a certain time t is represented, and consists of longitude lon and latitude lat at the time,expressed in terms of time t i A time segment of the center moment;
if it isL is then t Is an isolated point;
102. dynamic interval search, in which a range formed by start and stop subscripts of any plurality of continuous neighboring isolated points is set is called a dynamic interval:
the above represents two isolated points l x ,l y In a neighbor relation of (2), whereinindex(l t ) Representing isolated points l t Index subscript in original dataset, then l x ,l y Neighbor if and only ifFor a set of multiple isolated points, l= { L 1 ,l 2 ,l 3 …l i Arbitrary subset ∈>If->Then L is referred to as the i-neighbor isolated point set;
the range of subscript composition of start-stop elements of neighboring isolated point sets is a dynamic interval, denoted as i= [ index (l) 1 ),index(l i )];
103. Generating a static interval, removing all dynamic intervals in the subscript range of the preprocessing result set, and enabling all the rest intervals to be called static intervals;
let the original dataset subscript interval s= [0, n]Assume a dynamic interval I 1 =[i,i+k],I 2 =[j,j+u]Where k, u > 0, i > 0, J > i+k, j+u < n, interval J 1 =[0,i-1],J 2 =[i+k+1,j-1],J 3 =[j+u+1,n]Called static intervals;
step (2), model solving, namely extracting abnormal points of the dynamic interval by using a multidimensional Gaussian model and a sliding window distance model; extracting abnormal points from the static interval by using a gravity center distance scoring method;
the model solving of the dynamic interval comprises the following steps:
201. the longitude, latitude, extraction time and position switching rate of the extracted data sample are substituted into a Gaussian model to calculate probability density of each item of data in the whole data set, probability values are ordered from small to large, data corresponding to the first lambda probability values are selected to be added into an abnormal point candidate set E 1 Multidimensional Gaussian modelThe calculation formula of (2) is as follows:
wherein μ is an N-dimensional mean vector, Σ is an n×n covariance matrix, and Σ is a determinant of Σ;
202. establishing a sliding window distance model, and selecting any continuous data W=w with the size of 2k+1 from a preprocessing result set i-k ,…,w i-1 ,w i ,w i+1 ,…,w i+k As a window, where w i Is the center of window W, W up =w i-k ,…,w i-1 Represent the upper half window of length k, w down =w i+1 ,…,w i+k The lower half window of length k is represented, let R (w i ,w up ) Representing the center point w i And the upper half window w up Is expressed as:
wherein distance (w i ,w i-1 ) Representing the window center w i And the above information w i-1 Is used for the distance of the Europe type (R),representing the upper half window w up A maximum value of the distance between any two positions; delta represents an association threshold;
then window center w i And the upper half window w up Correlation if and only if R (w i ,w up )=1;
Let R (w) i ,w down ) Representing the center point w i And lower half window w down Is expressed as:
wherein distance (w i ,w i+1 ) Representing the window center w i And the following information w i+1 Is used for the distance of the Europe type (R),representing the lower half window w down A maximum value of the distance between any two positions;
then window center w i And lower half window w down Correlation if and only if R (w i ,w down )=1;
Converting the process of finding outliers on the pre-processing result set into translating the window W with a fixed Step size Step, finding a coincidence condition R (W i ,w up )=0∩R(w i ,w down ) Procedure of window center of=0, adding the window center point to the outlier candidate set E 2
The method for solving the abnormal points in the static interval by using the gravity center distance scoring method comprises the following steps:
203. center of gravity point selection, let M represent the set of all data in the static interval J, then L' = { l|l ε M, freq M (l) The > gamma } represents position data in the set M with occurrence frequency greater than the threshold gamma, wherein freq M (l) The frequency of the occurrence of the position l in the set M is represented by calculating the interval gravity center point O by a weighted average method, which is represented as:
wherein ,representing weights +.>For position l i Longitude of->For position l i N is the number of elements in L';
204. distance score calculation to distance (l) x ,l y ) When the distance between any two positions is expressed, the maximum value of the distance between any element in the set L and the center of gravity is referred to as the distance radius, and is expressed as
Further score for arbitrary data M in collection M m Expressed as:
then the static interval outlier candidate set E 3 ={m|m∈M,score m =1};
Step (3) using five-tuple to represent dynamic abnormal points and static abnormal points to form a five-tuple set to represent abnormal point sets;
the step (3) specifically comprises the following steps:
301. abnormal point candidate set E obtained by solving dynamic interval 1 and E2 Making intersections, wherein the same elements are extracted as outliers;
302. abnormal point candidate set E obtained by solving static interval 3 The medium element is an abnormal point;
303. five-tuple error= [ Account, lon, lat, cptime, errFlag ] is defined to represent the above-mentioned extracted outlier, where ErrFlag represents outlier type, errflag=0 represents dynamic outlier, and errflag=1 represents static outlier.
2. The method for detecting abnormal base station data based on interval division as claimed in claim 1, wherein the preprocessing rule in the step (1) is as follows: the cleaning data does not contain data of a preset field; the cleaned data is de-duplicated and time ordered.
CN201911329988.5A 2019-12-20 2019-12-20 Base station data anomaly detection method based on interval division Active CN111079089B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911329988.5A CN111079089B (en) 2019-12-20 2019-12-20 Base station data anomaly detection method based on interval division

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911329988.5A CN111079089B (en) 2019-12-20 2019-12-20 Base station data anomaly detection method based on interval division

Publications (2)

Publication Number Publication Date
CN111079089A CN111079089A (en) 2020-04-28
CN111079089B true CN111079089B (en) 2023-08-11

Family

ID=70316435

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911329988.5A Active CN111079089B (en) 2019-12-20 2019-12-20 Base station data anomaly detection method based on interval division

Country Status (1)

Country Link
CN (1) CN111079089B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113987578B (en) * 2021-10-28 2022-06-21 南京邮电大学 Space point set privacy protection matching method based on vector mapping and sliding window scanning

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105764162A (en) * 2016-05-10 2016-07-13 江苏大学 Wireless sensor network abnormal event detecting method based on multi-attribute correlation
CN107277765A (en) * 2017-05-12 2017-10-20 西南交通大学 A kind of mobile phone signaling track preprocess method based on cluster Outlier Analysis
CN110046665A (en) * 2019-04-17 2019-07-23 成都信息工程大学 Based on isolated two abnormal classification point detecting method of forest, information data processing terminal

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105764162A (en) * 2016-05-10 2016-07-13 江苏大学 Wireless sensor network abnormal event detecting method based on multi-attribute correlation
CN107277765A (en) * 2017-05-12 2017-10-20 西南交通大学 A kind of mobile phone signaling track preprocess method based on cluster Outlier Analysis
CN110046665A (en) * 2019-04-17 2019-07-23 成都信息工程大学 Based on isolated two abnormal classification point detecting method of forest, information data processing terminal

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张师超等.基于核估计和区间聚类的数据流中异常模式发现.计算机科学与探索.2007,第1卷(第1期),第108-114页. *

Also Published As

Publication number Publication date
CN111079089A (en) 2020-04-28

Similar Documents

Publication Publication Date Title
CN107515895B (en) Visual target retrieval method and system based on target detection
Senior A combination fingerprint classifier
CN108875816A (en) Merge the Active Learning samples selection strategy of Reliability Code and diversity criterion
CN104615986B (en) The method that pedestrian detection is carried out to the video image of scene changes using multi-detector
CN104268598B (en) Human leg detection method based on two-dimensional scanning lasers
CN111694958A (en) Microblog topic clustering method based on word vector and single-pass fusion
CN106250925B (en) A kind of zero Sample video classification method based on improved canonical correlation analysis
CN110188225A (en) A kind of image search method based on sequence study and polynary loss
Chen et al. A hybrid clustering algorithm based on fuzzy c-means and improved particle swarm optimization
CN110110792A (en) A kind of multi-tag method for classifying data stream based on incremental learning
JP4937395B2 (en) Feature vector generation apparatus, feature vector generation method and program
CN110879881B (en) Mouse track recognition method based on feature component hierarchy and semi-supervised random forest
CN108763295A (en) A kind of video approximate copy searching algorithm based on deep learning
CN110688940A (en) Rapid face tracking method based on face detection
CN110738053A (en) News theme recommendation algorithm based on semantic analysis and supervised learning model
CN103778206A (en) Method for providing network service resources
CN110287952A (en) A kind of recognition methods and system for tieing up sonagram piece character
CN110147841A (en) The fine grit classification method for being detected and being divided based on Weakly supervised and unsupervised component
CN110458022A (en) It is a kind of based on domain adapt to can autonomous learning object detection method
CN110647645A (en) Attack image retrieval method based on general disturbance
CN104361135A (en) Image search method
CN109871379A (en) A kind of online Hash K-NN search method based on data block study
CN111079089B (en) Base station data anomaly detection method based on interval division
Luqman et al. Subgraph spotting through explicit graph embedding: An application to content spotting in graphic document images
CN109284409A (en) Picture group geographic positioning based on extensive streetscape data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20230627

Address after: 210019 26F, building a, Fenghuo science and technology building, 88 yunlongshan Road, Jianye District, Nanjing City, Jiangsu Province

Applicant after: NANJING FIBERHOME TELECOMMUNICATION TECHNOLOGIES Co.,Ltd.

Address before: 211161 Sheng'an Avenue 739, Binjiang Economic Development Zone, Jiangning District, Nanjing City, Jiangsu Province

Applicant before: NANJING FENGHUO TIANDI COMMUNICATION TECHNOLOGY CO.,LTD.

GR01 Patent grant
GR01 Patent grant