CN107423758A - Abstraction sequence vital point drives training cheating class hour recognition methods from trend point - Google Patents

Abstraction sequence vital point drives training cheating class hour recognition methods from trend point Download PDF

Info

Publication number
CN107423758A
CN107423758A CN201710577583.8A CN201710577583A CN107423758A CN 107423758 A CN107423758 A CN 107423758A CN 201710577583 A CN201710577583 A CN 201710577583A CN 107423758 A CN107423758 A CN 107423758A
Authority
CN
China
Prior art keywords
point
trend
sequence
field sequence
vital
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710577583.8A
Other languages
Chinese (zh)
Inventor
孔宪光
刘燕龙
常建涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN201710577583.8A priority Critical patent/CN107423758A/en
Publication of CN107423758A publication Critical patent/CN107423758A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/285Selection of pattern recognition techniques, e.g. of classifiers in a multi-classifier system

Abstract

Training cheating class hour method is driven from trend point abstraction sequence vital point the invention discloses a kind of, is solved the problems, such as relatively low to driving training cheating class hour recognition efficiency and precision.Implementation step has:The field sequence data that can reflect abnormal fragment are chosen in training information data from driving;Field sequence data normalization is handled;The trend point of the field sequence is extracted, and then therefrom extracts vital point;The field sequence is represented with important point segmentation, in conjunction with the intensity of anomaly qualitative assessment that anti-k nearest neighbour methods are respectively segmented to the field sequence in method for detecting abnormality, by " Outlier factor " higher fragment as cheating class hour.Whole conceptual design of the invention is rigorous, complete, possesses the analysis ability that training information data is driven to magnanimity, drives training cheating class hour recognition efficiency and the degree of accuracy is high, and the cheating class hour in training information data is driven available for detection.

Description

Abstraction sequence vital point drives training cheating class hour recognition methods from trend point
Technical field
The invention belongs to car networking and the crossing domain of time series analysis, relates generally to drive training information data exception fragment Detection, abstraction sequence vital point drives training cheating class hour recognition methods in specifically a kind of point from trend, and training is driven for identifying False class hour in information data, and deducted and really train class hour to obtain student.
Background technology
So-called " cheating class hour ", refer to some driving schools or coach in the case where driving the system of training class hour record, using being found out Cheat method and produce false driving training class hour, such as by more cars class hour system on a car, in advance will The photo embedded system posed for photograph, do not start learner-driven vehicle, use " Racehorse machine " to run class hour etc..So-called " driving training cheating class hour identification ", Exactly analyze driving the data that are collected of training class hour system, find in abnormality class hour fragment so as to being detained Remove, really to be trained class hour, it might even be possible to find out vehicle or the driving school that cheating is more concentrated.
At present to driving school or coach when driving training cheating and differentiating, specialty main or by industry personnel Experience, i.e., the training of student's single is judged with the presence or absence of cheating according to the bound of field value, and can not be more Accurately identify this time training in cheating the substantially duration, i.e., can not efficiently identify cheating class hour, and by its Deduct so as to obtain real training class hour.
In the current big data epoch, data are full of caused influence far beyond enterprise field, and it can not only bring Commercial value, it can also produce social value.With social informatization and digitized development, field of traffic is deficient tired from data Border turns to the environment of data rich, constantly produces substantial amounts of various types of data.
It is time series data to drive training to practice information data.So-called time series is exactly tactic according to time order and function The ordered set of each observational record, wherein it is value type to observe and record, time series be widely present in business, it is economical with And the field such as scientific observation.Over time, time series generally comprises substantial amounts of data.How to these time serieses Data carry out statistics and analysis, therefrom find some valuable information and knowledge, and guarantor is provided to improve the training quality of driving school Card.
With the improvement and lifting of road traffic state, present driving school more comes the more, and training timekeeping system storage is driven by driving school Drive training information data amount it is also huger.Due to time series data magnanimity and it is complicated the characteristics of, directly in time series Carry out data analysis, expensive is not only spent in storage and calculating, and may influence algorithm accuracy and can By property.The pattern method for expressing of time series is to portray the Main Morphology of time series and ignore those small details.It is existing The linear segmented representation based on trend point be time series pattern method for expressing one kind, but it is excessively smart due to being segmented Carefully, only consider the change of the adjacent both sides of each data collection point, easily lose the short term variations trend of time series, exception can be caused Detection algorithm is lower to the accuracy of identification of abnormal fragment.
In summary, at present to driving the differentiation of training cheating row mainly by the professional experiences of industry personnel, it is impossible to accurate Identify cheating class hour;Traditional segmentation of the linear segmented representation based on trend point is excessively fine, will in storage and calculating Expensive is spent, the recognition efficiency to abnormal fragment can be caused relatively low;This method only considers the adjacent both sides of time series point Change, time series short term variations trend is easily lost, the recognition accuracy to abnormal fragment can be caused not high.
The content of the invention
It is an object of the invention in view of the above-mentioned problems of the prior art, proposing one kind abstraction sequence from trend point Vital point drives training cheating class hour recognition methods, to improve efficiency and the degree of accuracy to driving training cheating class hour identification.
The present invention proposes that driving for abstraction sequence vital point trains cheating class hour recognition methods in a kind of point from trend, and its feature exists In, including have the following steps:
(1) the field sequence data of cheating can be reflected by choosing:Training Information Number is driven from driving school's vehicle GPS collection According to middle choose identification abnormal data is used as compared with the field sequence data that can reflect cheating;
(2) field sequence data prediction:By the field sequence data normalization of selection to [0,1] section;
(3) the sequence vital point of the field is extracted:Represented with the linear segmented based on trend point abstraction sequence vital point Method, obtain the sequence vital point of the field;
(4) segmentation represents the field sequence and the abnormal fragment of identification:The sequence vital point of the field is sequentially connected with straight line, Field sequence straightway segmentation is represented, then defined with the anti-k nearest neighbour methods in the method for detecting abnormality based on density each The pattern density and abnormal level of segmentation, qualitative assessment is carried out with the intensity of anomaly being respectively segmented to the field sequence, will be wherein " different The higher segmentation of constant factor " is practised fraud class hour as training is driven.
The present invention has advantages below compared with prior art:
1) it is of the invention drive training cheating class hour method and industry personnel is no longer rely on professional experiences done to driving training cheating row Go out to differentiate;
2) the existing linear segmented representation segmentation based on trend point is excessively fine, to be spent in storage and calculating high Cost, the recognition efficiency to abnormal fragment can be caused relatively low.The present invention uses linear minute from trend point abstraction sequence vital point Section representation method simple, intuitive, operational performance is higher, original training information data time series of driving can be compressed, changed Come smaller storage and calculation cost, improve the efficiency to driving training cheating class hour identification;
3) the existing linear segmented representation based on trend point only considers the change of the adjacent both sides of time series point, easily Time series short term variations trend is lost, the recognition accuracy to abnormal fragment can be caused not high.What the present invention used linearly divides Section representation is represented original training information data time series segmentation of driving, and is remained the Main Morphology of original time series, is gone It except details is disturbed, can more reflect the unique characteristics of original time series, increasingly focus on the change of short-term trend, can larger journey The degree of accuracy of the degree lifting Outlier Detection Algorithm to abnormal fragment (driving training cheating class hour) identification.
Brief description of the drawings
Fig. 1 is that driving for the present invention trains cheating class hour recognition methods flow chart.
Fig. 2 be certain driving school's vehicle certain drive the course change curve of training.
Fig. 3 is with the course change curve after traditional segmentation based on trend point Linear fragment notation.
Fig. 4 is the course change after the segmentation of the linear segmented representation based on trend point abstraction sequence vital point of the present invention Curve.
Fig. 5 is the Outlier factor that course field is respectively segmented after traditional linear segmented representation based on trend point is segmented.
Fig. 6 is each minute of course field after being segmented based on trend point abstraction sequence vital point linear segmented representation of the present invention The Outlier factor of section.
Embodiment
In order to make the purpose , technical scheme and advantage of the present invention be clearer, with reference to embodiments, to invent into Row describes in detail.
Embodiment 1
With developing rapidly for city, traffic congestion, traffic pollution getting worse, frequent accidents occur, and these are all It is each big city urgent problem to be solved.In all kinds of problems, the traffic accident moment threatens the life security of people, thus subtracts The generation of such few accident has become very urgent.The present invention passes through all kinds of numbers with the collection of big data method analysis driving school According to realizing the accurate identification of time series data abnormal patterns, it is intended to solve the popular class hour cheating problem of driving school instantly, from root The appearance of " road killer " is reduced, and then reduces the generation of traffic accident, promotes the development of intelligent transportation.
Differentiate in the training cheating of driving to driving school or coach, specialty warp main at present or by industry personnel Test, it is impossible to efficiently identify cheating class hour.It is low and accurate that even if usage trend point still suffers from efficiency to abnormal fragment knowledge method for distinguishing The problem of exactness is not high.But still have widely in big data epoch, the identification for making full use of data to deploy cheating class hour Use value.
The present invention is for traditional linear segmented representation based on trend point to abnormal fragment recognition efficiency and the degree of accuracy The problem of relatively low, propose that driving for abstraction sequence vital point trains cheating class hour recognition methods in a kind of point from trend, referring to Fig. 1, tool Body step includes:
(1) the field sequence data of cheating can be reflected by choosing:Training information is driven from driving school's vehicle GPS collection Chosen in data and be used as identification abnormal data compared with the field sequence data that can reflect cheating.
The student that driving school's vehicle GPS is gathered drives training information data, including speed, course, mileage step-length etc. are multiple Field attribute.The field sequence data of abnormal fragment can be reflected by choosing, and according to existing industry personnel's experience, drive training information data In " course " field sequence data change compared with can reflect that driving school or coach's drive training cheating, therefore choose " course " word Duan Xulie is identification abnormal data.
(2) field sequence data prediction:By the field sequence data normalization of selection to [0,1] section.
For the convenient processing of subsequent data, ensure to accelerate convergence during program operation, by the field sequence data normalization To [0,1] section.
(3) the field sequence vital point is extracted:With the linear segmented representation based on trend point abstraction sequence vital point, Determine the vital point of the field sequence.
First from the original series point of the field, the trend point of the field sequence is selected, then this is filtered out from trend point The vital point of field sequence, to represent again the field sequence, the following degree of accuracy to the identification of abnormal fragment is improved indirectly.
(4) segmentation represents the field sequence and the abnormal fragment of identification:The sequence vital point of the field is sequentially connected with straight line, Field sequence straightway segmentation is represented, then defines each segmentation with the method for detecting abnormality (anti-k neighbours) based on density Pattern density and abnormal level, to each segmentation to the field sequence intensity of anomaly carry out qualitative assessment, will be wherein " different The higher segmentation of constant factor " is as cheating class hour.
Linear segmented representation of the present invention based on trend point abstraction sequence vital point, to " course " field original series Again after segmentation represents, each segmentation is detected in conjunction with the anti-k nearest neighbour methods in method for detecting abnormality, improves and abnormal fragment is known Other efficiency and the degree of accuracy.
Embodiment 2
Abstraction sequence vital point drives training cheating class hour recognition methods with embodiment 1 from trend point, in step (2) general The field sequence data normalization of selection is as follows to [0,1] section, concrete operations:
Wherein, x be choose field sequence actual value, xmax、xminMaximum and minimum value respectively in actual value, Y is the value after the field sequence normalization chosen.
Embodiment 3
Abstraction sequence vital point drives training cheating class hour recognition methods with embodiment 1-2 from trend point, in step (3) Based on the linear segmented representation of trend point abstraction sequence vital point, specifically include and have the following steps:
(3.1) the trend point of field sequence chosen is determined:To the field sequence data X=of selection<x1,x2,…,xn>, In addition to 2 points of the field sequence first and last, to other points in the field sequence, determine whether the point is trend point successively, specific bag Include and have the following steps:
(3.1.1) calculates the point x (i) 2 points x (i-1) adjacent with its left and right, slope tg1, tg2 of x (i+1) line.
(3.1.2) is as | tg1-tg2 | and during more than parameter g, this point is trend point;Otherwise, the point is not to be regarded as trend Point.G represents the threshold value that the adjacent both sides slope variation of trend point must is fulfilled for, and scope is 0.005≤g≤0.025.
(3.1.3) continue to calculate x (i+1) and x (i) and x (i+2) line slope, compares the absolute value and g of slope differences Relation, determination trend point, the rest may be inferred, until completing the judgement to all original series midpoints.
The institute of " course " field sequence is a little by judging after terminating, and trend point is just all true in the field sequence It is fixed.In the present invention typically in field sequence at least 3 trend points.
(3.2) vital point of the field sequence is determined:
After the trend point of " course " field sequence determines, except the field sequence first trend point and end trend point with Outside, it is adjacent by calculating the trend point and its adjacent trend point in left side and right side to remaining any one trend point of the field sequence The slope of trend point, come judge the trend point whether be the field sequence vital point.
The vital point of field sequence is chosen from the trend point of determination in the present invention, with sequence vital point to the field sequence Row are after segmentation represents again, remain the Main change trend of the field sequence, simple, intuitive, then with method for detecting abnormality Anti- k nearest neighbour methods detect to each segmentation, being capable of the largely operational efficiency of boosting algorithm and the accuracy of operation result.
Embodiment 4
Abstraction sequence vital point drives training cheating class hour recognition methods with embodiment 1-3 from trend point, in step (3.2) The field sequence vital point determination, i.e., judge whether other trend points in addition to first and last trend point are the field sequences successively The vital point of row, specifically includes and has the following steps:
(3.2.1) calculates trend point y (i) in " course " field sequence Tg α, it is tg β, y (i+1) and its adjacent trend point y (i+2) line slope in right side with its adjacent trend point y (i+1) lines slope in right side For tg γ.
(3.2.2) is if tg α * tg β > 0, y (i) are the field sequence vital point.
(3.2.3) if tg α * tg β < 0, and tg γ * tg β > 0, then y (i) and the field sequence vital point.
(3.2.4) remaining situation, y (i) are not the field sequence vital points.
Existing method for detecting abnormality easily ignores the abnormality in time series signal, and can not fully excavate therein has Imitate information, it is impossible to assess the trend state of each period, and the complexity feature of time series signal is caused directly in original number It is not high according to the less efficient or degree of accuracy of upper analysis.
Embodiment 5
Abstraction sequence vital point drives training cheating class hour recognition methods with embodiment 1-4 from trend point, in step (4) Segmentation is carried out with the field sequence vital point of extraction to the field sequence to represent, then with the method for detecting abnormality based on density In anti-k nearest neighbour methods define the pattern density and abnormal level of each segmentation, segmentation is also segmented model, has specifically included following step Suddenly:
(4.1) segmentation represents the field sequence:Because the present invention from drive training information data in choose " course " field sequence To identify abnormal data, so it needs to be determined that the vital point of " course " field sequence.After the vital point for determining the field sequence, The sequence vital point of the field, two tuple (l of each straightway are connected with straight line successivelyi,mi) represent, abscissa liFor the field The length of i-th section of straightway of sequence, represent Long-term change trend length;Ordinate miFor the slope of i-th section of straightway of field sequence, Represent variation tendency;Therefore a series of two element group representations of field sequence X:
X=<(l1,m1),(l2,m2),…,(lc,mc)>;
(4.2) defining mode distance:Define any two segmented models p (l of the field sequence1,m1) and q (l2,m2) pattern Distance d (p, q):
(4.3) size of the anti-k neighbours of each segmented model is determined, specifically includes and has the following steps:
(4.3.1) defines k-distance (k distances).To random natural number k, the k distances for defining segmented model p are segmentation The distance between pattern p and some segmented model o d (p, o), segmented model o should meet following condition:
K object o ' ∈ D/ { p } at least be present so that d (p, o ')≤d (p, o);
K-1 object o ' ∈ D/ { p } at most be present so that d (p, o ')≤d (p, o);
(4.3.2) defines segmentation p k apart from neighborhood Nk(p):
Given segmented model p k-distance (p), segmented model p k include all and segmented model p apart from neighborhood Distance is no more than k-distance (p) object.
(4.3.3) defines segmented model p anti-k neighbours RNNk(p) collection of all segmented model p as the object of k neighbours Close;
|RNNk(p) | represent the size of set, i.e., the number of anti-k neighbours, and RNNk(p)=q | q ∈ D, p ∈ Nk(q)}。 If p anti-k neighbours RNNk(p) | it is small, then it is assumed that p is in other fewer object neighborhoods, in a kind of isolated position;Instead It, RNNk(p) | it is bigger, show that p is in more objects neighborhood, the position inside cluster.
(4.4) density and abnormal level of each segmented model are defined, specifically includes and has the following steps:
(4.4.1) defines segmented model p density;
Given positive integer k and data set D are rightObject p density RD (p) is defined as
Wherein, RD (p) refers to the k in p apart from neighborhood Nk(p) in, p and the anti-k neighbours number of all neighbours in its neighborhood Mean ratio, reflect p local density;Defined according to p density, its abnormal level can be defined, abnormal level describes p Intensity of anomaly in data set D.
(4.4.2) defines the abnormal level of each fragment of the field sequence;
Given positive integer k and data set D.It is rightP abnormal level is defined as:
AOSk(p)=max { 1-RD (p), 0 };
When p is located at cluster center, its anti-k neighbours number is relatively large, and density is higher, thus its abnormal level is relatively low;Instead It, when p deviates, its anti-k neighbours number is relatively small, and density is relatively low, thus its abnormal level is higher;Work as abnormal level ASOk(p) when being more than some threshold value, then the object is judged to be abnormal, threshold value refers to Outlier factor.
The present invention drive training cheating class hour method make industry personnel be no longer rely on professional experiences to drive training cheating row make Differentiate;Training information data time series is driven to original with the linear segmented representation based on trend point abstraction sequence vital point Segmentation represents, remains the Main Morphology of original time series, eliminates details interference, can more reflect original time series oneself Body feature, the change of short-term trend is increasingly focused on, can largely lift Outlier Detection Algorithm and abnormal fragment (is driven training to make Disadvantage class hour) identification the degree of accuracy.
A more complete full and accurate example is given below, the present invention is further described:
Embodiment 5
The training of driving of abstraction sequence vital point practises fraud class hour recognition methods with embodiment 1-4 from trend point, has specifically included Following steps:
(1) the field sequence data of cheating can be reflected by choosing:
What driving school's vehicle GPS was gathered drives training information data, including multiple fields such as speed, course, mileage step-length Attribute;According to existing industry personnel's experience, drive the change of " course " field sequence data in training information data and drive compared with can reflect School or coach's drives training cheating, therefore chooses " course " field sequence as identification abnormal data;
(2) field sequence data prediction:By the field sequence data normalization of selection to [0,1] section;
(3) the sequence vital point of the field is extracted:Represented with the linear segmented based on trend point abstraction sequence vital point Method, the vital point of the field sequence is obtained, specifically includes and has the following steps:
(3.1) the trend point of field sequence chosen is determined:To " course " field sequence data X=of selection<x1, x2,…,xn>, in addition to 2 points of the field sequence first and last, to other points in the field sequence, determine whether the point is trend successively Point, specifically includes and has the following steps:
(3.1.1) calculates the point x (i) 2 points x (i-1) adjacent with its left and right, slope tg1, tg2 of x (i+1) line;
(3.1.2) is as | tg1-tg2 | and during more than parameter g, this point is trend point;Otherwise, the point is not to be regarded as trend Point.G represents the threshold value that the adjacent both sides slope variation of trend point must is fulfilled for, and scope is 0.005≤g≤0.025;
(3.1.3) continues to calculate the slope of x (i+1) and x (i) and x (i+2) line, and the rest may be inferred;
After judgement terminates, trend point at least 3 in the field sequence;
(3.2) vital point of field sequence chosen is determined:
It is right in addition to first the trend point and end trend point of the field sequence after the trend point of the field sequence determines Remaining any one trend point of the field sequence, by calculating the trend point and its adjacent trend point in left side and right side neighbour's trend point Slope, come judge the trend point whether be the field sequence vital point, specifically include and have the following steps:
(3.2.1) calculates trend point y (i) and the adjacent trend point y in its left side in the field sequence successively in addition to first and last trend point (i-1) line slope is tg α, is tg β, y (i+1) and its adjacent trend point in right side with its adjacent trend point y (i+1) lines slope in right side Y (i+2) lines slope is tgr;
(3.2.2) if tg α * tg β > 0, y (i) is the time sequence important point;
(3.2.3) if tg α * tg β < 0, and tg γ * tg β > 0, then y (i) and the time sequence important point;
(3.2.4) remaining situation, y (i) are not the time sequence important points.
(4) segmentation represents the field sequence and the abnormal fragment of identification:With the field sequence vital point of extraction to the field Sequence carries out segmentation expression, then with the method for detecting abnormality (anti-k neighbours) based on density define each segmentation pattern density and Abnormal level, to each segmentation to the field sequence intensity of anomaly carry out qualitative assessment, will wherein " Outlier factor " it is higher Segmentation as cheating class hour, specifically include and have the following steps:
(4.1) segmentation represents the field sequence:After the vital point for determining the field sequence, the word is connected with straight line successively The sequence vital point of section, two tuple (l of each straightwayi,mi) represent, abscissa liFor the length of i-th section of straightway of field sequence Degree, represent Long-term change trend length;Ordinate miFor the slope of i-th section of straightway of field sequence, variation tendency is represented;Therefore should A series of two element group representations of field sequence X:
X=<(l1,m1),(l2,m2),…,(lc,mc)>;
(4.2) defining mode distance:Define any two segmented models p (l of the field sequence1,m1) and q (l2,m2) pattern Distance:
(4.3) size of the anti-k neighbours of each segmented model is determined, specifically includes and has the following steps:
(4.3.1) defines k-distance (k distances).To random natural number k, the k distances for defining segmented model p are segmentation The distance between pattern p and some segmentation object o d (p, o), object o should meet following condition:
K object o ' ∈ D/ { p } at least be present so that d (p, o ')≤d (p, o);
K-1 object o ' ∈ D/ { p } at most be present so that d (p, o ')≤d (p, o);
(4.3.2) defines segmentation p k apart from neighborhood Nk(p):
Given segmented model p k-distance (p), segmented model p k include all and segmented model p apart from neighborhood Distance is no more than k-distance (p) object;
(4.3.3) defines segmented model p anti-k neighbours RNNk(p) collection of all segmented model p as the object of k neighbours Close;
|RNNk(p) | represent the size of set, i.e., the number of anti-k neighbours, and RNNk(p)=q | q ∈ D, p ∈ Nk(q)}。 If p anti-k neighbours RNNk(p) | it is small, then it is assumed that p is in other fewer object neighborhoods, in a kind of isolated position;Instead It, RNNk(p) | it is bigger, show that p is in more objects neighborhood, the position inside cluster.
(4.4) density and abnormal level of each segmented model are defined, specifically includes and has the following steps:
(4.4.1) defines segmented model p density:
Given positive integer k and data set D are rightObject p density RD (p) is defined as
Wherein, RD (p) refers to the k in p apart from neighborhood Nk(p) in, p and the anti-k neighbours number of all neighbours in its neighborhood Mean ratio, reflect p local density;Defined according to p density, its abnormal level can be defined, abnormal level describes p Intensity of anomaly in data set D.
(4.4.2) defines the abnormal level of each fragment of the field sequence:
Given positive integer k and data set D.It is rightP abnormal level is defined as:
AOSk(p)=max { 1-RD (p), 0 };
When p is located at cluster center, its anti-k neighbours number is relatively large, and density is higher, thus its abnormal level is relatively low;Instead It, when p deviates, its anti-k neighbours number is relatively small, and density is relatively low, thus its abnormal level is higher;Work as abnormal level ASOk(p) when being more than some threshold value, then judge the object to be abnormal.
The application effect of the present invention is explained in detail with reference to simulated experiment.
Embodiment 6
Underneath with traditional fragment notation based on trend point and the segmentation based on trend point abstraction sequence vital point Representation, the result of time series exception fragment identification is contrasted.Driving for student may be existing true in training information data Training class hour, also have class hour of cheating.The student that driving school's vehicle GPS is gathered drives training information data, including speed, boat To multiple field attributes such as, mileage step-lengths.Test data set is the GPS system record of certain driving school car certain subject two training It is real drive training information data, sequence length is all 8014, i.e. car GPS have recorded 8014 moment points and drive training Information Number According to.The learner-driven vehicle does not start learner-driven vehicle between moment [2100,2300] and [4700,4800], uses " Racehorse machine " to run class hour, Remaining moment is all normal training practice.Data are as shown in table 1:
What the driving school learner-driven vehicle of table 1 was once trained drives training information data
First, sample data of " course " the field sequence data as analysis is chosen, will " course " field sequence conduct Identify abnormal fragment.Before segmentation, the sequence data of " course " field of the learner-driven vehicle GPS gathers is done into normalized, Normalized between [0,1].
Reference picture 2, it is the change curve of course field original series;Wherein abscissa represents moment point, ordinate generation Value after table " course " field sequence data normalization.
Reference picture 3, it is the change curve of course field sequence after traditional segmentation based on trend point fragment notation;
Reference picture 4, it is course field sequence after the linear segmented representation segmentation based on trend point abstraction sequence vital point Change curve;The wherein threshold value g=0.005 of slope variation rate.
The present invention represents for the segmentation of " course " field sequence:Basis is compared with Fig. 2, from figure 3, it can be seen that course word Section original series are after traditional processing based on trend point fragment notation, though largely remain original course number According to the trend of change, but the segmentation is excessively fine, to spend expensive in storage and calculating, can cause to abnormal fragment Recognition efficiency is relatively low;Fig. 4 just eliminates details interference relative to Fig. 2 and Fig. 3 in fragmentation procedure, more being capable of extraction time sequence Crucial variation characteristic is arranged, reflects the unique characteristics of original time series, increasingly focuses on the change of short-term trend, storage and calculating Spend cost smaller, be more beneficial for improving efficiency and accuracy that data calculate.
" course " field sequence is respectively segmented abnormality detection:Take k=10, Outlier factor span is [0,1], by it is abnormal because Son is considered as abnormal fragment more than more than 0.6 fragment.Anti- k nearest neighbour methods in also in conjunction with the method for detecting abnormality based on density Under the conditions of, with traditional linear segmented representation based on trend point and the present invention based on trend point abstraction sequence vital point Linear segmented representation is represented course field time sequence segment, and abnormal fragment recognition effect is contrasted:
Reference picture 5, Fig. 5 are to represent each fragment after being segmented to course field sequence by traditional segmentation based on trend point Outlier factor.As seen from Figure 5, qualitative assessment is carried out to the intensity of anomaly of each segmentation with anti-k nearest neighbour methods in method for detecting abnormality Afterwards, the Outlier factor on moment section [2100,2300] is very close to 1.0, and the Outlier factor on other sections is substantially all Below 0.6, only have identified in moment section [2100,2300] fragment is cheating class hour, does not identify that moment section is [4700,4800] false class hour.
Reference picture 6, be the present invention by the fragment notation based on trend point abstraction sequence vital point to course field sequence The Outlier factor of each fragment after row segmentation.As seen from Figure 6, Outlier Detection Algorithm (anti-k neighbours) can equally detect moment area Between be [2100,2300] abnormal fragment, and be found that in addition the moment section be [4700,4800] abnormal fragment, its exception The factor is more more notable than Fig. 5 homologous segment.
Fig. 5 and Fig. 6 contrasts visible, the critical value using Outlier factor 0.6 as abnormal fragment, traditional based on trend point Anti- k nearest neighbour methods only detected part and drive training cheating class hour in linear segmented representation combination method for detecting abnormality, of the invention Linear segmented representation based on trend point abstraction sequence vital point combines anti-k nearest neighbour methods in same method for detecting abnormality, knows The degree of accuracy driven training cheating class hour, improve to driving training cheating class hour identification of whole is not gone out.
The present invention can more precisely identify the abnormal fragment driven in training information data, that is, practise fraud class hour, thus To can practise fraud class hour from drive deducted in training information data after just obtain true class hour of student's practice.
In brief, it is disclosed by the invention it is a kind of from trend point abstraction sequence vital point drive training cheating class hour method, solution The problem of having determined to driving training cheating class hour recognition efficiency and relatively low precision.Implementation step has:Chosen from driving in training information data The field sequence data of cheating can be reflected;Field sequence data normalization is handled;Extract the trend of the field sequence Point, and then therefrom extract vital point;The field sequence is represented with important point segmentation, in conjunction with anti-k neighbours in method for detecting abnormality The intensity of anomaly qualitative assessment that method is respectively segmented to the field sequence, by " Outlier factor " higher fragment as cheating class hour.This Invent that whole conceptual design is rigorous, complete, possess the analysis ability that training information data is driven to magnanimity, drive training cheating class hour identification effect Rate and the degree of accuracy are high, and the cheating class hour in training class hour system is driven available for detection.

Claims (3)

1. in a kind of point from trend abstraction sequence vital point drive training cheating class hour recognition methods, it is characterised in that including just like Lower step:
(1) the field sequence data of abnormal fragment can be reflected by choosing:Training information data is driven from driving school's vehicle GPS collection Middle choose is used as identification abnormal data compared with the field sequence data that can reflect cheating;
(2) field sequence data prediction:By the field sequence data normalization of selection to [0,1] section;
(3) the sequence vital point of the field is extracted:With the linear segmented representation based on trend point abstraction sequence vital point, obtain To the sequence vital point of the field;
(4) segmentation represents the field sequence and the abnormal fragment of identification:The sequence vital point of the field is sequentially connected with straight line, by this The segmentation of field sequence straightway represents, then defines each segmentation with the anti-k nearest neighbour methods in the method for detecting abnormality based on density Pattern density and abnormal level, qualitative assessment is carried out with the intensity of anomaly of each segmentation to the field sequence, will be wherein " abnormal The higher segmentation of the factor " is practised fraud class hour as training is driven.
2. abstraction sequence vital point drives training cheating class hour recognition methods, its spy in the point according to claim 1 from trend Sign is, the linear segmented representation based on trend point abstraction sequence vital point described in step (3), has specifically included following step Suddenly:
(3.1) the trend point of the field sequence chosen is determined:
In addition to 2 points of the field sequence first and last, to other points in the field sequence, pass sequentially through and calculate the point and its left and right phase Adjacent 2 points of slope determines whether the point is trend point;
(3.2) vital point of the field sequence is determined:
After the trend point of the field sequence determines, in addition to first the trend point and end trend point of the field sequence, to the word Remaining any one trend point of Duan Xulie, by calculating the oblique of the trend point and its adjacent trend point in left side and right side neighbour's trend point Rate, come judge the trend point whether be the field sequence vital point.
3. abstraction sequence vital point drives training cheating class hour recognition methods, its spy in the point according to claim 2 from trend Sign is, the vital point of the field sequence is determined in step (3.2), i.e., judges in addition to the field sequence first and last trend point it successively His trend point whether be the field sequence vital point, specifically include and have the following steps:
It is tg α that (3.2.1), which calculates trend point y (i) and its adjacent trend point y (i-1) lines slope in left side in the field sequence, with it Right side neighbour's trend point y (i+1) lines slope is tg β, y (i+1) and its right side adjoint point y (i+2) lines slope is tg γ;
(3.2.2) is if tg α * tg β > 0, y (i) are the field sequence vital point;
(3.2.3) if tg α * tg β < 0, and tg γ * tg β > 0, then y (i) and the field sequence vital point;
(3.2.4) remaining situation, y (i) are not the field sequence vital points.
CN201710577583.8A 2017-07-15 2017-07-15 Abstraction sequence vital point drives training cheating class hour recognition methods from trend point Pending CN107423758A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710577583.8A CN107423758A (en) 2017-07-15 2017-07-15 Abstraction sequence vital point drives training cheating class hour recognition methods from trend point

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710577583.8A CN107423758A (en) 2017-07-15 2017-07-15 Abstraction sequence vital point drives training cheating class hour recognition methods from trend point

Publications (1)

Publication Number Publication Date
CN107423758A true CN107423758A (en) 2017-12-01

Family

ID=60426547

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710577583.8A Pending CN107423758A (en) 2017-07-15 2017-07-15 Abstraction sequence vital point drives training cheating class hour recognition methods from trend point

Country Status (1)

Country Link
CN (1) CN107423758A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110059904A (en) * 2017-12-13 2019-07-26 罗伯特·博世有限公司 The automatic method for working out the rule of rule-based anomalous identification in a stream
CN112101468A (en) * 2020-09-18 2020-12-18 刘吉耘 Method for judging abnormal sequence in sequence combination

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104462819A (en) * 2014-12-09 2015-03-25 国网四川省电力公司信息通信公司 Local outlier detection method based on density clustering
CN104915568A (en) * 2015-06-24 2015-09-16 哈尔滨工业大学 Satellite telemetry data abnormity detection method based on DTW
US20150356421A1 (en) * 2014-06-05 2015-12-10 Mitsubishi Electric Research Laboratories, Inc. Method for Learning Exemplars for Anomaly Detection

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150356421A1 (en) * 2014-06-05 2015-12-10 Mitsubishi Electric Research Laboratories, Inc. Method for Learning Exemplars for Anomaly Detection
CN104462819A (en) * 2014-12-09 2015-03-25 国网四川省电力公司信息通信公司 Local outlier detection method based on density clustering
CN104915568A (en) * 2015-06-24 2015-09-16 哈尔滨工业大学 Satellite telemetry data abnormity detection method based on DTW

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
KEOGH E,ET AL.: "Finding unusual medical time-series subsequences:algorithms and applications", 《IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE》 *
周大镯等: "时间序列异常检测", 《计算机工程与应用》 *
周庆兰: "多元时间序列异常检测的研究", 《万方数据库》 *
廖俊等: "基于趋势转折点的时间序列分段线性表示", 《计算机工程与应用》 *
张忠平等: "基于反 k 近邻的流数据离群点挖掘算法", 《计算机工程》 *
翟晓东: "基于车辆监控的驾驶员培训管理系统的设计与实现", 《万方数据库》 *
詹艳艳等: "基于斜率提取边缘点的时间序列分段线", 《计算机科学》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110059904A (en) * 2017-12-13 2019-07-26 罗伯特·博世有限公司 The automatic method for working out the rule of rule-based anomalous identification in a stream
CN112101468A (en) * 2020-09-18 2020-12-18 刘吉耘 Method for judging abnormal sequence in sequence combination
CN112101468B (en) * 2020-09-18 2024-04-16 刘吉耘 Method for judging abnormal sequence in sequence combination

Similar Documents

Publication Publication Date Title
CN101814149B (en) Self-adaptive cascade classifier training method based on online learning
CN106910185B (en) A kind of DBCC disaggregated model construction method based on CNN deep learning
CN107133974B (en) Gaussian Background models the vehicle type classification method combined with Recognition with Recurrent Neural Network
CN106652445B (en) A kind of road traffic accident method of discrimination and device
CN104766046B (en) One kind is detected using traffic mark color and shape facility and recognition methods
CN107204114A (en) A kind of recognition methods of vehicle abnormality behavior and device
CN107945153A (en) A kind of road surface crack detection method based on deep learning
CN106314438A (en) Method and system for detecting abnormal track in driver driving track
CN108921089A (en) Method for detecting lane lines, device and system and storage medium
CN108596030A (en) Sonar target detection method based on Faster R-CNN
CN106023220A (en) Vehicle exterior part image segmentation method based on deep learning
CN110378869A (en) A kind of rail fastening method for detecting abnormality of sample automatic marking
CN105225523B (en) A kind of parking space state detection method and device
CN108352064A (en) Image processing apparatus, image processing method and program
CN110120218A (en) Expressway oversize vehicle recognition methods based on GMM-HMM
CN103413145A (en) Articulation point positioning method based on depth image
CN104978567A (en) Vehicle detection method based on scenario classification
CN109740609A (en) A kind of gauge detection method and device
CN104573707A (en) Vehicle license plate Chinese character recognition method based on multi-feature fusion
CN109583295A (en) A kind of notch of switch machine automatic testing method based on convolutional neural networks
CN107423758A (en) Abstraction sequence vital point drives training cheating class hour recognition methods from trend point
CN113836850A (en) Model obtaining method, system and device, medium and product defect detection method
CN104268584A (en) Human face detection method based on hierarchical filtration
CN114385765B (en) Production time prediction method of trailing suction hopper dredger based on track data
CN110443319B (en) Track duplicate removal method and device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20171201

WD01 Invention patent application deemed withdrawn after publication