CN107688658A - The localization method and device of a kind of abnormal data - Google Patents

The localization method and device of a kind of abnormal data Download PDF

Info

Publication number
CN107688658A
CN107688658A CN201710792861.1A CN201710792861A CN107688658A CN 107688658 A CN107688658 A CN 107688658A CN 201710792861 A CN201710792861 A CN 201710792861A CN 107688658 A CN107688658 A CN 107688658A
Authority
CN
China
Prior art keywords
data
dimension
history
item
subdivision
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710792861.1A
Other languages
Chinese (zh)
Inventor
周双志
周葳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing QIYI Century Science and Technology Co Ltd
Original Assignee
Beijing QIYI Century Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing QIYI Century Science and Technology Co Ltd filed Critical Beijing QIYI Century Science and Technology Co Ltd
Priority to CN201710792861.1A priority Critical patent/CN107688658A/en
Publication of CN107688658A publication Critical patent/CN107688658A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention provides a kind of localization method and device of abnormal data, by obtaining all dimensions corresponding to data target to be positioned, dimension data and history dimension data according to corresponding to the dimension obtains each described dimension, wherein, dimension data includes at least one subdivision item data, history dimension data includes at least one history subdivision item data, the subdivision item data included using the dimension data of each dimension, build primary vector, the history included using the history dimension data of this dimension segments item data, build secondary vector, calculate the similarity between the primary vector of each dimension and the secondary vector, obtain similarity minimum value.Similarity is smaller, then illustrates that the possibility of this dimension generation data exception is bigger.The dimension corresponding to positioning the similarity minimum value, realizes and is automatically positioned the dimension for being most likely to occur data exception.Without artificially searching the data under each dimension one by one, the location efficiency of abnormal data is improved.

Description

The localization method and device of a kind of abnormal data
Technical field
The invention belongs to the localization method and device of data field of locating technology, more particularly to a kind of abnormal data.
Background technology
Under current big data background, an item data index may correspond to multiple dimensions, and each dimension includes more respectively again Item subdivision item.Because the data that each data target includes are very more, therefore, the positioning to abnormal data in each data target Become highly difficult.By taking advertising income as an example, dimension corresponding to this data target of advertising income include playing platform, advertiser and Player.Playing platform is multiple platform subdivision items including multiple different platforms, and it is flat that each platform subdivision item corresponds to this respectively Advertising income data under platform.Advertiser is that multiple advertisers segment item including multiple different advertisers, and each advertiser is thin Subitem corresponds to the advertising income data of this advertiser respectively.Player is multiple player subdivisions including multiple different players , each player subdivision item corresponds to the advertising income data of this player respectively.Abnormal data in advertising income is carried out The mode of positioning is:The data of each dimension are searched respectively, and it is which dimension goes out then rule of thumb artificially to analyze bottom Show problem, it is determined that the dimension of abnormal data be present, then it is which subdivision item has exception under this dimension to analyze.
In the prior art, it is that each dimension is searched by artificial mode one by one during being positioned to abnormal data Under data, the mode efficiency of this abnormal data positioning is low.
The content of the invention
In view of this, it is different for improving it is an object of the invention to provide a kind of localization method of abnormal data and device The efficiency of regular data positioning.
Technical scheme is as follows:
The present invention provides a kind of localization method of abnormal data, and the localization method includes:
Obtain all dimensions corresponding to data target to be positioned;
According to all dimensions, corresponding with each dimension dimension data and history dimension data are obtained;Wherein, it is described Dimension data includes at least one subdivision item data, and the history dimension data includes at least one history subdivision item data;
The subdivision item data included using the dimension data of each dimension, is built primary vector, utilizes institute The history subdivision item data that history dimension data includes is stated, builds secondary vector;
The similarity between the primary vector of each dimension and the secondary vector is calculated, it is described every to obtain The similarity of individual dimension;
Compare the similarity of all dimensions, determine similarity minimum value;
Position dimension corresponding to the similarity minimum value.
Preferably, the similarity between the primary vector for calculating each dimension and the secondary vector, is obtained The similarity of each dimension, including:
Calculate the cosine angle value between the primary vector of each dimension and the secondary vector;
According to the cosine angle value, it is determined that similar between the primary vector and the secondary vector of each dimension Degree.
Preferably, the subdivision item data that the dimension data using each dimension includes, primary vector is built, The history included using the history dimension data segments item data, builds secondary vector, including:
Judge the number for the subdivision item data that the dimension data of each dimension includes and the history dimension data Whether the number of the history subdivision item data included is identical;
If the number for segmenting item data is identical with the number of history subdivision item data, the dimension of each dimension is utilized The subdivision item data that data include, primary vector is built, the history included using the history dimension data segments item number According to structure secondary vector.
Preferably, in addition to:
If the number for segmenting item data is different from the number of history subdivision item data, included according to the dimension data Item data is segmented, subdivision item corresponding with each subdivision item data is searched respectively, obtains the subdivision item being made up of the subdivision item Set;
The history included according to the history dimension data segments item data, searches respectively and each history subdivision item data Corresponding history segments item, obtains segmenting item set by the history that history subdivision item forms
Compare the subdivision item set and history subdivision item set;
The subdivision item that the subdivision item set is different from history subdivision item set is added to the subdivision item collection In conjunction, wherein, it is 0 to segment item data corresponding to the subdivision item being added in the subdivision item set;
Item data is segmented using corresponding to the subdivision item set added after segmenting item, builds new primary vector;
The subdivision item that the history subdivision item set is different from the subdivision item set is added to the history subdivision In item set, wherein, item data is segmented as 0 corresponding to the subdivision item being added in the history subdivision item set;
Using add segment item after the history subdivision item set corresponding to segment item data, build new second to Amount.
The present invention also provides a kind of positioner of abnormal data, and the positioner includes:
First acquisition unit, for obtaining all dimensions corresponding to data target to be positioned;
Second acquisition unit, for all dimensions got according to the first acquisition unit, obtain and each dimension Corresponding dimension data and history dimension data;Wherein, the dimension data includes at least one subdivision item data, the history Dimension data includes at least one history subdivision item data;
Construction unit, the dimension data of each dimension for being got using second acquisition unit are included Subdivision item data, build primary vector, the history included using the history dimension data segments item data, structure second Vector;
Computing unit, it is similar between the primary vector of each dimension and the secondary vector for calculating Degree, to obtain the similarity of each dimension;
Comparing unit, for the similarity of all dimensions, determine similarity minimum value;
Positioning unit, for positioning dimension corresponding to the similarity minimum value.
Preferably, the computing unit, including:
First computation subunit, for calculating the cosine between the primary vector of each dimension and the secondary vector Angle value;
Similarity determination subelement, for according to the cosine angle value, it is determined that the primary vector of each dimension and Similarity between the secondary vector.
Preferably, the construction unit, including:
Judgment sub-unit, the number for the subdivision item data that the dimension data for judging each dimension includes and institute Whether the number for stating the history subdivision item data that history dimension data includes is identical;
Subelement is built, if identical with the number of history subdivision item data for the number for segmenting item data, using often The subdivision item data that the dimension data of individual dimension includes, primary vector is built, using being wrapped in the history dimension data The history subdivision item data included, builds secondary vector.
Preferably, the construction unit, in addition to:
First searches subelement, if different from the number of history subdivision item data for the number for segmenting item data, root The subdivision item data included according to the dimension data, subdivision item corresponding with each subdivision item data is searched respectively, is obtained by institute State the subdivision item set of subdivision item composition;
Second searches subelement, and the history for being included according to the history dimension data segments item data, searches respectively History subdivision item corresponding with each history subdivision item data, obtain segmenting item collection by the history that history subdivision item forms Close;
Comparing subunit, for the subdivision item set and history subdivision item set;
First subdivision item Component units, the thin of the subdivision item set is different from item set for the history to be segmented Subitem is added in the subdivision item set, wherein, it is added to subdivision item number corresponding to the subdivision item in the subdivision item set According to for 0;
Primary vector construction unit, for segmenting item number using corresponding to the subdivision item set added after segmenting item According to building new primary vector;
Second subdivision item Component units, the thin of item set is segmented for the history will to be different from the subdivision item set Subitem is added in the history subdivision item set, wherein, it is added to corresponding to the subdivision item in the history subdivision item set It is 0 to segment item data;
Secondary vector construction unit, for segmenting item using corresponding to the history subdivision item set added after segmenting item Data, build new secondary vector.
Compared with prior art, above-mentioned technical proposal provided by the invention has the following advantages that:
It was found from above-mentioned technical proposal, by obtaining all dimensions corresponding to data target to be positioned in the application, according to The dimension obtains dimension data and history dimension data corresponding to each described dimension, wherein, dimension data is included at least One subdivision item data, history dimension data include at least one history subdivision item data, utilize the dimension data of each dimension The subdivision item data included, primary vector is built, the history included using the history dimension data of this dimension segments item number According to, build secondary vector, calculate the similarity between the primary vector of each dimension and the secondary vector, obtain similar Spend minimum value.Similarity is smaller, then illustrates that the possibility of this dimension generation data exception is bigger.Position the similarity minimum value The corresponding dimension, realizes and is automatically positioned the dimension for being most likely to occur data exception.Without artificially searching one by one Data under each dimension, improve the location efficiency of abnormal data.
Brief description of the drawings
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing There is the required accompanying drawing used in technology description to be briefly described, it should be apparent that, drawings in the following description are the present invention Some embodiments, for those of ordinary skill in the art, on the premise of not paying creative work, can also basis These accompanying drawings obtain other accompanying drawings.
Fig. 1 is a kind of flow chart of the localization method of abnormal data provided in an embodiment of the present invention;
Fig. 2 is the flow chart of the localization method of another abnormal data provided in an embodiment of the present invention;
Fig. 3 is a kind of structure chart of the positioner of abnormal data provided in an embodiment of the present invention.
Embodiment
To make the purpose, technical scheme and advantage of the embodiment of the present invention clearer, below in conjunction with the embodiment of the present invention In accompanying drawing, the technical scheme in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is Part of the embodiment of the present invention, rather than whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art The every other embodiment obtained under the premise of creative work is not made, belongs to the scope of protection of the invention.
Referring to Fig. 1, it illustrates a kind of flow chart of the localization method of abnormal data provided in an embodiment of the present invention, institute Stating localization method includes:
S101, obtain all dimensions corresponding to data target to be positioned;
According to being currently needed for, data target to be positioned is selected, it is necessary to explanation, data corresponding to each data target It is based on time series, such as data target to be positioned is advertising income, and data based on time series corresponding to it include Daily advertising income, such as the advertising income data of yesterday and the advertising income data of today.
Wherein, each data target to be positioned all includes multiple dimensions, using advertising income this data target to be positioned as Example, it includes playing platform, advertiser and player these three dimensions.
Because dimension corresponding to different data targets may be different, it is therefore desirable to determine what data target to be positioned included All dimensions.In the present embodiment, by according to data target, the mapping table between searching data index and dimension, obtaining number According to all dimensions corresponding to index.It is, of course, also possible to all dimensions corresponding to data target to be positioned are obtained in other way Degree.
S102, according to the dimension, obtain corresponding with each dimension dimension data and history dimension data;Its In, at least one of the dimension data includes subdivision item data, the history dimension data includes at least one history subdivision item Data;
By taking advertising income as an example, it is playing platform, advertiser and player to get all dimensions corresponding to advertising income. Wherein, this dimension of playing platform includes multiple different playing platforms, such as TV platform and the network platform again, each to play Platform is all a subdivision item of this dimension of playing platform, and each corresponding advertising income data of subdivision item are segmented Item data, and then constitute the advertising income data i.e. dimension data under playing platform this dimension.Similarly, this is one-dimensional by advertiser Degrees of data is also to be made up of multiple subdivision item datas, and this dimension data of player is also to be made up of multiple subdivision item datas.
Simultaneously as advertising income data are based on time series, it includes advertising income data today and yesterday is wide Income data is accused, today, advertising income data were made up of three dimension datas, and each dimension data includes multiple subdivision items Data, yesterday, advertising income data were also to be made up of three dimension datas, and each dimension data includes multiple subdivision item datas.
Using three dimension datas corresponding to today advertising income data as dimension data, at least one of it includes subdivision item Data;Using three dimension datas corresponding to yesterday advertising income data as history dimension data, at least one of it includes history Segment item data.
After all dimensions corresponding to data target to be positioned are obtained, for each dimension, obtain corresponding to this dimension Dimension data and history dimension data.
S103, the subdivision item data included using the dimension data of each dimension, primary vector is built, utilizes institute The history subdivision item data that history dimension data includes is stated, builds secondary vector;
To be illustrated exemplified by playing platform this dimension, the p that p different playing platforms are formed is included under playing platform Individual subdivision item, therefore this dimension data of playing platform includes p subdivision item data, is X respectively11, X12, X13..., X1p, play This history dimension data of platform includes p history and segments item data, is X respectively21, X22, X23..., X2p;Utilize p subdivision item Data, the primary vector of structure areItem number is segmented using p history According to the secondary vector of structure is
Include q that q different advertisers are formed to illustrate exemplified by advertiser this dimension, under advertiser to segment , therefore this dimension data of advertiser includes q subdivision item data, is Y respectively11, Y12, Y13..., Y1q, advertiser this goes through History dimension data includes q history and segments item data, is Y respectively21, Y22, Y23..., Y2q;Utilize q subdivision item data, structure Primary vector be Item data is segmented using q history, the of structure Two vectors are
Similarity between S104, the primary vector for calculating each dimension and the secondary vector, obtains each dimension The similarity of degree;
In the present embodiment, pressed from both sides by calculating the cosine between the primary vector of each dimension and the secondary vector Angle value;According to the cosine angle value, it is determined that the similarity between the primary vector and the secondary vector of each dimension. For each dimension, using the similarity between the primary vector of dimension and the secondary vector as this dimension Similarity.
Utilize formula Calculate the primary vector of this dimension of playing platformAnd secondary vectorBetween similarity cos α.
Similarly, the primary vector of this dimension of advertiser is calculatedAnd secondary vectorBetween similarity cos β.
The similarity of all dimensions corresponding to data target to be positioned is calculated successively.
S105, all dimensions of comparison similarity, determine similarity minimum value;
Similarity between two vectors is judged by cosine value, cosine value is bigger, and angle is smaller, and similarity is higher.
Assuming that this data target of advertising income, today, advertising income data increased compared to advertising income data yesterday 20% is normal, then, if abnormal data is not present in this dimension of playing platform, then the dimension data bag of playing platform It will be that the respective items history that includes of history dimension data of playing platform segments 1.2 times of item data to include each subdivision item data. I.e. each advertising income data playing platform lower today are yesterday 1.2 times of advertising income data respectively, X21=1.2X11, X22= 1.2X12, X23=1.2X13..., X2p=1.2X1p
Utilize the formula in S104, the primary vector of this dimension of the playing platform of calculatingAnd secondary vectorBetween Similarity cos α=1.
If there is abnormal data in this dimension of advertiser, then the subdivision item data that this inevitable dimension data includes is compared It not is 20% in the growth for the history subdivision item data that history dimension data includes, utilizes the formula in S104, the advertisement of calculating The primary vector of this main dimensionAnd secondary vectorBetween similarity cos β<1.
Therefore, similarity corresponding to dimension is smaller, and the possibility that dimension data includes abnormal data is bigger.
In the present embodiment, the similarity of all dimensions is compared one by one, obtains similarity minimum value.Similarity is minimum Dimension corresponding to value, there is abnormal data maximum.
It is, of course, also possible to according to the ascending order of numerical value, the similarity of all dimensions is ranked up, in sequence Check whether abnormal data be present.
The dimension corresponding to S106, the positioning similarity minimum value.
It was found from above-mentioned technical proposal, by obtaining all dimensions corresponding to data target to be positioned in the application, according to The dimension obtains dimension data and history dimension data corresponding to each described dimension, wherein, dimension data is included at least One subdivision item data, history dimension data include at least one history subdivision item data, utilize the dimension data of each dimension The subdivision item data included, primary vector is built, the history included using the history dimension data of this dimension segments item number According to, build secondary vector, calculate the similarity between the primary vector of each dimension and the secondary vector, obtain similar Spend minimum value.Similarity is smaller, then illustrates that the possibility of this dimension generation data exception is bigger.Position the similarity minimum value The corresponding dimension, this dimension include abnormal data, realize and are automatically positioned most probable and include the dimension of abnormal data.And The data under each dimension need not be artificially searched one by one, improve the location efficiency of abnormal data.
Referring to Fig. 2, it illustrates the flow chart of the localization method of another abnormal data provided in an embodiment of the present invention, The localization method includes:
S201, obtain all dimensions corresponding to data target to be positioned;
S202, according to the dimension, obtain corresponding with each dimension dimension data and history dimension data;Its In, at least one of the dimension data includes subdivision item data, the history dimension data includes at least one history subdivision item Data;
Step S201-S202 is identical with the step S101-S102 in a upper embodiment, and here is omitted.
S203, the number for judging the subdivision item data that the dimension data of each dimension includes and the history dimension Whether the number for the history subdivision item data that data include is identical;
To be illustrated exemplified by playing platform this dimension, this dimension data of playing platform includes p and segments item data, But for some reason, the number for the history subdivision item data that this history dimension data of possible playing platform includes is more than Or less than p.The number and the number of history subdivision item data for segmenting item data may be different.For example, add one newly Playing platform be used for play advertisement, then the playing platform number of today than yesterday playing platform number more than one, accordingly , the number for the subdivision item data that this dimension data of playing platform includes segments item number than the history that history dimension data includes According to number more than one.
For each dimension, after obtaining dimension data and history dimension data, the subdivision that dimension data includes is judged Whether the number for the history subdivision item data that the number of item data includes with history dimension data is identical.
If the number for segmenting item data is identical with the number of history subdivision item data, S204 is performed;
If the number for segmenting item data is different from the number of history subdivision item data, S205 is performed;
S204, the subdivision item data included using the dimension data of each dimension, primary vector is built, utilizes institute The history subdivision item data that history dimension data includes is stated, builds secondary vector;Perform S208;
Step S204 is identical with the S103 in a upper embodiment, and here is omitted.
S205, the subdivision item data included according to the dimension data, search respectively corresponding with each subdivision item data Item is segmented, obtains the subdivision item set being made up of the subdivision item;The history included according to the history dimension data segments item Data, history subdivision item corresponding with each history subdivision item data is searched respectively, obtains what is be made up of history subdivision item History segments item set;
To be illustrated exemplified by playing platform this dimension, after getting dimension data, because dimension data is included at least One subdivision item data, therefore all subdivision item datas can be got.Due to each subdivision item data, corresponding one is segmented item, That is the corresponding subdivision item data of a playing platform, such as the corresponding subdivision item data of platform of televising, netcast The corresponding subdivision item data of platform, therefore can be searched respectively according to subdivision item data corresponding with each segmenting item data thin Subitem, and obtain the subdivision item set that the subdivision item corresponding to all subdivision item datas forms.
Similarly, the history included according to the history dimension data of playing platform segments item data, searches respectively with each going through History subdivision item corresponding to history subdivision item data, obtain segmenting item set by the history that history subdivision item forms.
Item set and history subdivision item set are segmented described in S206, comparison;
To be illustrated exemplified by playing platform this dimension, set that subdivision item set is made up of multiple playing platforms, History subdivision item set is also the set being made up of multiple playing platforms;
Due to today compared to may add yesterday playing platform play advertisement, it is also possible to reduce playing platform broadcasting Advertisement, the playing platform of today and yesterday is caused to be changed.And subdivision item set is made up of the playing platform of today, is gone through History subdivision item set is made up of the playing platform of yesterday.
For example, subdivision item set includes A1, A2, A3, B1, history subdivision item set includes A1, A2, A3;I.e. today compares yesterday Day adds this playing platform of B1.
Thinner subitem set and history subdivision item set, because subdivision item set includes wrapping in history subdivision item set All subdivision items included, that is, segment item set include A1, A2, A3, therefore, history segment item set in it is not distinct in Segment the subdivision item of item set.
Due to subdivision item A1, A2, A3 that subdivision item set includes, A1, A2, A3 phase in item set are segmented with history Together, but the B1 that item set includes is segmented, do not included in history subdivision item set, therefore, obtain segmenting area in item set Not in history subdivision item set subdivision item be B1.
Above-mentioned is to segment the number for the subdivision item that item set includes than the subdivision item that includes of history subdivision item set Number it is more, exemplified by illustrate.It is of course possible to the number that the subdivision item that subdivision item set includes be present is segmented than history Item gathers the few situation of the number for segmenting item included, and such a situation is also to obtain segmenting item set difference according to the method described above In the subdivision item of history subdivision item set.
It should be noted that the subdivision item that the subdivision item that subdivision item set includes includes with history subdivision item set is completely not Meanwhile such as subdivision item collection be combined into A1, A2, A3, history subdivision item collection is combined into B1, B2, B3, equivalent to having changed all broadcastings Platform, then can not by the advertising income of today compared with the advertising income of yesterday, and using the data of variation abnormality as Abnormal data.The presence of abnormal data refers to being directed to same benchmark, and the data variation based on time series is abnormal, such as same Under playing platform, advertising income under this playing platform today is different compared to the advertising income change under yesterday this playing platform Often.
S207, will the history segment item set in be different from it is described subdivision item set subdivision item be added to the subdivision In item set, wherein, item data is segmented as 0 corresponding to the subdivision item being added in the subdivision item set;
Item data is segmented using corresponding to the subdivision item set added after segmenting item, builds new primary vector;
The subdivision item that the history subdivision item set is different from the subdivision item set is added to the history subdivision In item set, wherein, item data is segmented as 0 corresponding to the subdivision item being added in the history subdivision item set;
Using add segment item after the history subdivision item set corresponding to segment item data, build new second to Amount;
Because the history segments the not distinct subdivision item in the subdivision item set in item set, thus it is described The subdivision item that subdivision item set includes is constant, is still A1, A2, A3, B1;Included using the subdivision item set all Item A1, A2, A3, B1 corresponding subdivision item data respectively are segmented, builds new primary vector.Wherein it is possible to pass through the side tabled look-up Formula is searched segments item data correspondingly with each subdivision item.
Because the subdivision item that the history subdivision item set is different from the subdivision item set is B1, subdivision item B1 is added Enter into history subdivision item set A1, A2, A3, the history subdivision item set added after subdivision item include for A1, A2, A3, B1;
Utilize subdivision item data, structure corresponding to history subdivision item set A1, A2, A3, B1 difference added after segmenting item New secondary vector;Wherein, subdivision item data corresponding to being added to the subdivision item B1 in the history subdivision item set is 0, and Subdivision item data can be obtained by way of tabling look-up corresponding to A1, A2, A3.
Herein, it is necessary to illustrate, in order to ensure the accuracy of positioning result, it is necessary to described in after ensureing addition subdivision item All subdivision items that subdivision item set includes put in order, and are wrapped with adding in the history subdivision item set after segmenting item Putting in order for all subdivision items included is identical.I.e. all putting in order for item of subdivision are all A1, A2, A3, B1.
Similarity between S208, the primary vector for calculating each dimension and the secondary vector, obtains each dimension The similarity of degree;
S209, all dimensions of comparison similarity, determine similarity minimum value;
The dimension corresponding to S2010, the positioning similarity minimum value.
Step S208-S2010 is identical with the S104-S106 in a upper embodiment, and here is omitted.
It was found from above-mentioned technical proposal, by obtaining all dimensions, root corresponding to data target to be positioned in the present embodiment Dimension data and history dimension data corresponding to each described dimension are obtained according to the dimension, wherein, dimension data is included extremely One item missing segments item data, and history dimension data includes at least one history subdivision item data, utilizes the number of dimensions of each dimension According to the subdivision item data included, primary vector is built, the history included using the history dimension data of this dimension segments item Data, secondary vector is built, the similarity between the primary vector of each dimension and the secondary vector is calculated, obtains phase Like degree minimum value.Similarity is smaller, then illustrates that the possibility of this dimension generation data exception is bigger.It is minimum to position the similarity The dimension corresponding to value, this dimension include abnormal data, realize and are automatically positioned most probable and include the dimension of abnormal data. Without artificially searching the data under each dimension one by one, the location efficiency of abnormal data is improved.And improve calculating phase Like the accuracy of degree.
A kind of localization method of corresponding above-mentioned shown abnormal data, present invention also offers a kind of positioning of abnormal data Device, its structural representation are referred to shown in Fig. 3, and a kind of positioner for abnormal data that the present embodiment provides includes:
First acquisition unit 11, second acquisition unit 12, construction unit 13, computing unit 14, comparing unit 15 and positioning Unit 16.
The first acquisition unit 11, for obtaining all dimensions corresponding to data target to be positioned;
The second acquisition unit 12, for the dimension got according to the first acquisition unit, obtain and each institute State dimension data corresponding to dimension and history dimension data;Wherein, the dimension data includes at least one subdivision item data, institute Stating history dimension data includes at least one history subdivision item data;
The construction unit 13, wrap in the dimension data for each dimension got using second acquisition unit The subdivision item data included, builds primary vector, and the history included using the history dimension data segments item data, structure the Two vectors;
Preferably, the construction unit 13, including:
Judgment sub-unit 13A and structure subelement 13B;
The judgment sub-unit 13A, the subdivision item data that the dimension data for judging each dimension includes Whether the number for the history subdivision item data that number includes with the history dimension data is identical;
The structure subelement 13B, if identical with the number of history subdivision item data for the number for segmenting item data, The subdivision item data included using the dimension data of each dimension, primary vector is built, utilizes the history number of dimensions Item data is segmented according to the history included, builds secondary vector.
Preferably, the construction unit 13, in addition to:
Described first searches subelement 31, if segmenting the number of item data not with history for the number for segmenting item data Together, then the subdivision item data included according to the dimension data, subdivision item corresponding with each subdivision item data is searched respectively, is obtained To the subdivision item set being made up of the subdivision item;
Described second searches subelement 32, and the history for being included according to the history dimension data segments item data, point Item data corresponding history subdivision item Cha Zhao be segmented with each history, obtains being segmented by the history that history subdivision item forms Item set;The comparing subunit 33, for the subdivision item set and history subdivision item set;
The first subdivision item Component units 34, the subdivision item collection is different from for the history to be segmented in item set The subdivision item of conjunction is added in the subdivision item set, wherein, it is added to thin corresponding to the subdivision item in the subdivision item set Data of itemizing are 0;
The primary vector construction unit 35, for utilizing subdivision corresponding to the subdivision item set added after segmenting item Item data, build new primary vector;
The second subdivision item Component units 36, item collection is segmented for the history will to be different from the subdivision item set The subdivision item of conjunction is added in the history subdivision item set, wherein, the subdivision item being added in the history subdivision item set Corresponding subdivision item data is 0;The secondary vector construction unit 37, for utilizing the history subdivision added after segmenting item Subdivision item data, builds new secondary vector corresponding to item set.The computing unit 14, for calculating described in each dimension Similarity between primary vector and the secondary vector, obtain the similarity of each dimension;
Preferably, the computing unit 14, including:
First computation subunit 14A and similarity determination subelement 14B;
The first computation subunit 14A, for calculate each dimension the primary vector and the secondary vector it Between cosine angle value;
The similarity determination subelement 14B, for according to the cosine angle value, it is determined that described the first of each dimension Similarity between secondary vector described in vector sum.
The comparing unit 15, for the similarity of more all dimensions, determine similarity minimum value;
The positioning unit 16, for positioning the dimension corresponding to the similarity minimum value.
It was found from above-mentioned technical proposal, obtained in the present embodiment by first acquisition unit corresponding to data target to be positioned All dimensions, second acquisition unit dimension data and history number of dimensions according to corresponding to the dimension obtains each described dimension According to, wherein, at least one of dimension data includes subdivision item data, history dimension data includes at least one history subdivision item number According to, the subdivision item data that construction unit is included using the dimension data of each dimension, primary vector is built, utilizes this dimension History that history dimension data includes subdivision item data, builds secondary vector, and computing unit calculates described the of each dimension Similarity between secondary vector described in one vector sum, and similarity minimum value is obtained by comparing unit.Similarity is smaller, then Illustrate that the possibility of this dimension generation data exception is bigger.Positioned by positioning unit described corresponding to the similarity minimum value Dimension, this dimension include abnormal data, realize and are automatically positioned most probable and include the dimension of abnormal data.Without artificial The data under each dimension are searched one by one, improve the location efficiency of abnormal data.
For foregoing each method embodiment, in order to be briefly described, therefore it is all expressed as to a series of combination of actions, but It is that those skilled in the art should know, the present invention is not limited by described sequence of movement, because according to the present invention, certain A little steps can use other orders or carry out simultaneously.Secondly, those skilled in the art should also know, be retouched in specification The embodiment stated belongs to preferred embodiment, necessary to involved action and the module not necessarily present invention.
It should be noted that each embodiment in this specification is described by the way of progressive, each embodiment weight Point explanation is all difference with other embodiment, between each embodiment identical similar part mutually referring to. For method class embodiment, because it is substantially similar to apparatus embodiments, so description is fairly simple, related part is joined See the part explanation of apparatus embodiments.
Finally, it is to be noted that, herein, such as first and second or the like relational terms be used merely to by One entity or operation make a distinction with another entity or operation, and not necessarily require or imply these entities or operation Between any this actual relation or order be present.Moreover, term " comprising ", "comprising" or its any other variant meaning Covering including for nonexcludability, so that process, method, article or equipment including a series of elements not only include that A little key elements, but also the other element including being not expressly set out, or also include for this process, method, article or The intrinsic key element of equipment.In the absence of more restrictions, the key element limited by sentence "including a ...", is not arranged Except other identical element in the process including the key element, method, article or equipment being also present.
The foregoing description of the disclosed embodiments, those skilled in the art are enable to realize or using the present invention.To this A variety of modifications of a little embodiments will be apparent for a person skilled in the art, and generic principles defined herein can Without departing from the spirit or scope of the present invention, to realize in other embodiments.Therefore, the present invention will not be limited The embodiments shown herein is formed on, and is to fit to consistent with principles disclosed herein and features of novelty most wide Scope.
Described above is only the preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art For member, under the premise without departing from the principles of the invention, some improvements and modifications can also be made, these improvements and modifications also should It is considered as protection scope of the present invention.

Claims (8)

1. a kind of localization method of abnormal data, it is characterised in that the localization method includes:
Obtain all dimensions corresponding to data target to be positioned;
According to all dimensions, corresponding with each dimension dimension data and history dimension data are obtained;Wherein, the dimension Data include at least one subdivision item data, and the history dimension data includes at least one history subdivision item data;
The subdivision item data included using the dimension data of each dimension, is built primary vector, is gone through using described The history subdivision item data that history dimension data includes, builds secondary vector;
The similarity between the primary vector of each dimension and the secondary vector is calculated, to obtain each dimension The similarity of degree;
Compare the similarity of all dimensions, determine similarity minimum value;
Position dimension corresponding to the similarity minimum value.
2. localization method according to claim 1, it is characterised in that the primary vector for calculating each dimension and Similarity between the secondary vector, the similarity of each dimension is obtained, including:
Calculate the cosine angle value between the primary vector of each dimension and the secondary vector;
According to the cosine angle value, it is determined that the similarity between the primary vector and the secondary vector of each dimension.
3. localization method according to claim 1 or 2, it is characterised in that the number of dimensions using each dimension According to the subdivision item data included, primary vector is built, the history included using the history dimension data segments item data, Secondary vector is built, including:
Judge the number for the subdivision item data that the dimension data of each dimension includes with being wrapped in the history dimension data Whether the number of the history subdivision item data included is identical;
If the number for segmenting item data is identical with the number of history subdivision item data, the dimension data of each dimension is utilized The subdivision item data included, primary vector is built, the history included using the history dimension data segments item data, structure Build secondary vector.
4. localization method according to claim 3, it is characterised in that also include:
If the number for segmenting item data is different from the number of history subdivision item data, the subdivision included according to the dimension data Item data, subdivision item corresponding with each subdivision item data is searched respectively, obtains the subdivision item set being made up of the subdivision item;
The history included according to the history dimension data segments item data, searches respectively corresponding with each history subdivision item data History subdivision item, obtain segmenting item set by the history that forms of history subdivision item
Compare the subdivision item set and history subdivision item set;
The subdivision item that the subdivision item set is different from history subdivision item set is added in the subdivision item set, Wherein, subdivision item data corresponding to being added to the subdivision item in the subdivision item set is 0;
Item data is segmented using corresponding to the subdivision item set added after segmenting item, builds new primary vector;
The subdivision item that the history subdivision item set is different from the subdivision item set is added to the history subdivision item collection In conjunction, wherein, it is 0 to segment item data corresponding to the subdivision item being added in the history subdivision item set;
Item data is segmented using corresponding to the history subdivision item set added after segmenting item, builds new secondary vector.
5. a kind of positioner of abnormal data, it is characterised in that the positioner includes:
First acquisition unit, for obtaining all dimensions corresponding to data target to be positioned;
Second acquisition unit, for all dimensions got according to the first acquisition unit, obtain corresponding with each dimension Dimension data and history dimension data;Wherein, the dimension data includes at least one subdivision item data, the history dimension Data include at least one history subdivision item data;
Construction unit, the dimension data of each dimension for being got using second acquisition unit include thin Itemize data, build primary vector, the history included using the history dimension data segments item data, structure second to Amount;
Computing unit, for calculating the similarity between the primary vector of each dimension and the secondary vector, with Obtain the similarity of each dimension;
Comparing unit, for the similarity of all dimensions, determine similarity minimum value;
Positioning unit, for positioning dimension corresponding to the similarity minimum value.
6. positioner according to claim 5, it is characterised in that the computing unit, including:
First computation subunit, for calculating the cosine angle between the primary vector of each dimension and the secondary vector Value;
Similarity determination subelement, for according to the cosine angle value, it is determined that the primary vector of each dimension and described Similarity between secondary vector.
7. the positioner according to claim 5 or 6, it is characterised in that the construction unit, including:
Judgment sub-unit, the number for the subdivision item data that the dimension data for judging each dimension includes are gone through with described Whether the number for the history subdivision item data that history dimension data includes is identical;
Subelement is built, if identical with the number of history subdivision item data for the number for segmenting item data, utilizes each dimension The subdivision item data that the dimension data of degree includes, primary vector is built, is included using the history dimension data History segments item data, builds secondary vector.
8. positioner according to claim 7, it is characterised in that the construction unit, in addition to:
First searches subelement, if different from the number of history subdivision item data for the number for segmenting item data, according to institute The subdivision item data that dimension data includes is stated, subdivision item corresponding with each subdivision item data is searched respectively, obtains by described thin The subdivision item set of subitem composition;
Second searches subelement, and the history for being included according to the history dimension data segments item data, searches respectively and every History subdivision item corresponding to individual history subdivision item data, obtain segmenting item set by the history that history subdivision item forms;
Comparing subunit, for the subdivision item set and history subdivision item set;
First subdivision item Component units, the subdivision item of the subdivision item set is different from item set for the history to be segmented It is added in the subdivision item set, wherein, subdivision item data corresponding to the subdivision item being added in the subdivision item set is 0;
Primary vector construction unit, for segmenting item data, structure using corresponding to the subdivision item set added after segmenting item Build new primary vector;
Second subdivision item Component units, for the subdivision item of the history subdivision item set will to be different from the subdivision item set It is added in the history subdivision item set, wherein, segmented corresponding to the subdivision item being added in the history subdivision item set Item data is 0;
Secondary vector construction unit, for segmenting item number using corresponding to the history subdivision item set added after segmenting item According to building new secondary vector.
CN201710792861.1A 2017-09-05 2017-09-05 The localization method and device of a kind of abnormal data Pending CN107688658A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710792861.1A CN107688658A (en) 2017-09-05 2017-09-05 The localization method and device of a kind of abnormal data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710792861.1A CN107688658A (en) 2017-09-05 2017-09-05 The localization method and device of a kind of abnormal data

Publications (1)

Publication Number Publication Date
CN107688658A true CN107688658A (en) 2018-02-13

Family

ID=61156016

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710792861.1A Pending CN107688658A (en) 2017-09-05 2017-09-05 The localization method and device of a kind of abnormal data

Country Status (1)

Country Link
CN (1) CN107688658A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108959415A (en) * 2018-06-07 2018-12-07 北京奇艺世纪科技有限公司 A kind of exception dimension localization method, device and electronic equipment
CN111158977A (en) * 2019-12-12 2020-05-15 深圳前海微众银行股份有限公司 Abnormal event root cause positioning method and device
CN116756136A (en) * 2023-08-16 2023-09-15 深圳市明心数智科技有限公司 Automatic data processing method, device, equipment and medium for fishpond monitoring equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102929928A (en) * 2012-09-21 2013-02-13 北京格致璞科技有限公司 Multidimensional-similarity-based personalized news recommendation method
CN103138963A (en) * 2011-11-25 2013-06-05 华为技术有限公司 Method and device for positioning network problems based on user perception
CN103886068A (en) * 2014-03-20 2014-06-25 北京国双科技有限公司 Data processing method and device for Internet user behavior analysis
CN105119734A (en) * 2015-07-15 2015-12-02 中国人民解放军防空兵学院 Full network anomaly detection positioning method based on robust multivariate probability calibration model
CN106030565A (en) * 2014-01-23 2016-10-12 微软技术许可有限责任公司 Computer performance prediction using search technologies

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103138963A (en) * 2011-11-25 2013-06-05 华为技术有限公司 Method and device for positioning network problems based on user perception
CN102929928A (en) * 2012-09-21 2013-02-13 北京格致璞科技有限公司 Multidimensional-similarity-based personalized news recommendation method
CN106030565A (en) * 2014-01-23 2016-10-12 微软技术许可有限责任公司 Computer performance prediction using search technologies
CN103886068A (en) * 2014-03-20 2014-06-25 北京国双科技有限公司 Data processing method and device for Internet user behavior analysis
CN105119734A (en) * 2015-07-15 2015-12-02 中国人民解放军防空兵学院 Full network anomaly detection positioning method based on robust multivariate probability calibration model

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108959415A (en) * 2018-06-07 2018-12-07 北京奇艺世纪科技有限公司 A kind of exception dimension localization method, device and electronic equipment
CN108959415B (en) * 2018-06-07 2022-03-04 北京奇艺世纪科技有限公司 Abnormal dimension positioning method and device and electronic equipment
CN111158977A (en) * 2019-12-12 2020-05-15 深圳前海微众银行股份有限公司 Abnormal event root cause positioning method and device
WO2021114977A1 (en) * 2019-12-12 2021-06-17 深圳前海微众银行股份有限公司 Method and device for positioning fundamental cause of abnormal event
CN116756136A (en) * 2023-08-16 2023-09-15 深圳市明心数智科技有限公司 Automatic data processing method, device, equipment and medium for fishpond monitoring equipment
CN116756136B (en) * 2023-08-16 2023-10-31 深圳市明心数智科技有限公司 Automatic data processing method, device, equipment and medium for fishpond monitoring equipment

Similar Documents

Publication Publication Date Title
CN107688658A (en) The localization method and device of a kind of abnormal data
CN104123332B (en) The display methods and device of search result
CN106156082B (en) A kind of ontology alignment schemes and device
CN103514304B (en) Project recommendation method and device
CN103886048B (en) Cluster-based increment digital book recommendation method
CN106339383B (en) A kind of search ordering method and system
CN105046352B (en) Water supply network leakage loss calculation method based on blood vessel bionic principle
CN103902545B (en) A kind of classification path identification method and system
CN104008106B (en) A kind of method and device obtaining much-talked-about topic
CN103605715B (en) Data Integration treating method and apparatus for multiple data sources
CN103886001A (en) Personalized commodity recommendation system
CN105373597A (en) Collaborative filtering recommendation method for user based on k-medoids project clustering and local interest fusion
CN104915418B (en) Recommendation of websites method and device
CN106326483A (en) Collaborative recommendation method with user context information aggregation
CN106446189A (en) Message-recommending method and system
CN106933947A (en) A kind of searching method and device, electronic equipment
CN106651432A (en) Building advertisement accurate putting system and method
JP3571162B2 (en) Similar object search method and apparatus
CN105787126A (en) K-d (k-dimensional) tree generation method and k-d tree generation device
CN106027507B (en) The recognition methods of anonymous identity in a kind of social networks
CN107402961A (en) One kind recommends method and device, electronic equipment
CN105930505A (en) Information search method and apparatus
CN106708896A (en) ECharts map displaying method and device
CN109508417A (en) Method, apparatus, electronic equipment and the readable storage medium storing program for executing of recommended
CN106021430B (en) Full-text search matching process and system based on the self-defined dictionaries of Lucence

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180213

RJ01 Rejection of invention patent application after publication