CN109522300A - Valid data screening plant - Google Patents
Valid data screening plant Download PDFInfo
- Publication number
- CN109522300A CN109522300A CN201811247432.7A CN201811247432A CN109522300A CN 109522300 A CN109522300 A CN 109522300A CN 201811247432 A CN201811247432 A CN 201811247432A CN 109522300 A CN109522300 A CN 109522300A
- Authority
- CN
- China
- Prior art keywords
- data
- fluctuations
- current
- fluctuation
- sequence number
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
A kind of valid data screening plant, described device include: acquiring unit, suitable for obtaining data set to be analyzed;Screening unit obtains traversing current data extremely suitable for traversing to the data in acquired data set;It is excessive to judge whether current data changes compared with previous data;When the current data of determination is excessive compared to the variation of previous data, then the data in front and back a period of time based on current data, determine and record the data fluctuations position of corresponding whole section of fluctuation data;Next data is obtained, until the data set to be analyzed all complete by traversal.The efficiency and accuracy of valid data screening can be improved in above-mentioned scheme.
Description
Technical field
The invention belongs to data analysis technique fields, more particularly to a kind of valid data screening plant.
Background technique
Since 2012, " big data " word started to frequently enter the visual field of people, is widely received and is studied.
The data that these scales go from strength to strength, all under cover huge potential values, decides numerous enterprises and every field not behind
Come the direction developed and achievement.The data bring for having had more and more Enterprise Consciousness to these explosive growths now is hidden
Suffer from, starts gradually to pay attention to mass data to the importance of enterprise.Believe although big data has brought endlessly business
Breath and social value, but wherein the problem of be also apparent from --- the data volume under current era is excessively huge.
Excessively huge data volume under big data environment results in therefrom analyze effective information, then needs to disappear
A large amount of resource and time are consumed, and daily mean data therein and marginal data all occupy very big specific gravity.To reduce this
A little resources for calculating consumption and time can also be from its data of reduction other than designing more outstanding data analysis algorithm
The angle of scale is started with.
Summary of the invention
The technical problem to be solved by the present invention is to how improve the efficiency and accuracy of valid data screening.
In order to achieve the above object, the embodiment of the invention provides a kind of valid data screening plant, described device includes:
Acquiring unit, suitable for obtaining data set to be analyzed;
Screening unit obtains traversing current data extremely suitable for traversing to the data in acquired data set;
It is excessive to judge whether current data changes compared with previous data;When the current data of determination becomes compared to previous data
When changing excessive, then the data in front and back a period of time based on current data, determine and record corresponding whole section of fluctuation data
Data fluctuations position;Next data is obtained, until the data set to be analyzed all complete by traversal.
Optionally, the screening unit, suitable for calculating the absolute difference between the current data and previous data,
And by the way that the absolute difference being calculated to be compared with preset difference threshold, to judge current data and previous item number
According to compared to whether change it is excessive.
Optionally, the screening unit is suitable for when the current data of determination is excessive compared to the variation of previous data, will
The count value of preset n of logger increases preset numerical value;It is pre- to judge whether the current count value of the logger is greater than
If count threshold;When the current count value for determining the logger is greater than preset count threshold, preset dynamic is obtained
The information of stored the last item data fluctuations position in array;When determining the dynamic array for the empty or described dynamic number
When data fluctuations position locating for the last one valid data stored in group is tail node, the before current data is determined
((n-2) -1) data is data fluctuations first node;Add (minimum company again when the sequence number of the current data of determination subtracts (n-2)
Continuous number -1) numerical value close to data daily mean value when, it is again plus (minimum continuous to determine that the sequence number of current data subtracts (n-2)
Several -1) data is the tail node of data fluctuations;When determining that the dynamic array is non-to store in the empty or described dynamic array
The last one valid data locating for data fluctuations position it is non-be tail node, and the sequence number of current data subtracts (n-2)
Again plus when the non-mean value close to data daily of the numerical value of (minimum consecutive numbers -1), (n-2) article number before current data is determined
According to the interim node for data fluctuations;The logger is moved to left X, and the sign bit of the logger is reset;X be greater than
Or the integer equal to 1 and less than n.
Optionally, the screening unit is data suitable for ((n-2) -1) data before determining current data
When fluctuating first node, the data fluctuations position of the corresponding whole section of fluctuation data is recorded using following array:
Wherein, NiIndicate the sequence number of the data fluctuations first node of the corresponding whole section of fluctuation data of current data, NcIt indicates
The sequence number of current data, MiIndicate the judgment value label of the data fluctuations tail node of corresponding whole section of fluctuation data.
Optionally, the screening unit is data fluctuations suitable for (n-2) data before determining current data
Interim node when, using following array record it is described it is corresponding whole section fluctuation data data fluctuations position:
Wherein, NiIndicate the sequence number of the interim node of data fluctuations of the corresponding whole section of fluctuation data of current data, NcTable
Show the sequence number of current data, MiIndicate the judgment value label of the data fluctuations tail node of corresponding whole section of fluctuation data.
Optionally, the screening unit, suitable for adding (minimum again when the sequence number for determining current data subtracts (n-2)
Consecutive numbers -1) data when being data fluctuations tail node, records the corresponding whole section of fluctuation data using following array
Data fluctuations position:
Wherein, NiIndicate the sequence number of the data fluctuations tail node of the corresponding whole section of fluctuation data of current data, NcIt indicates
The sequence number of current data, MiIndicate the judgment value label of the data fluctuations tail node of corresponding whole section of fluctuation data.
Optionally .X=1.
Optionally, n=32.
Compared with prior art, the invention has the benefit that
Above-mentioned scheme, by being traversed to the data in acquired data set, obtain traversing extremely when preceding article number
According to;When the current data of determination is excessive compared to the variation of previous data, then the front and back based on current data is for a period of time
Interior data determine and record the data fluctuations position of corresponding whole section of fluctuation data;Obtain next data, until it is described to
The data set of analysis all complete by traversal, it is possible to determine that data fluctuations situation in special time period before and after data, dynamic is held should
The overall process of the Decision boundaries situation of data, the case where improving data precision and final data analysis result precision as far as possible
Under, reduce the overhead in terms of data screening method bring computing resource itself and additional storage space to the greatest extent, therefore can
To improve the accuracy and efficiency of valid data screening.
Detailed description of the invention
In order to more clearly explain the technical solutions in the embodiments of the present application, make required in being described below to embodiment
Attached drawing is briefly described, it should be apparent that, the drawings in the following description are only some examples of the present application, for
For those of ordinary skill in the art, without any creative labor, it can also be obtained according to these attached drawings
His attached drawing.
Fig. 1 is the flow diagram of one of embodiment of the present invention valid data screening technique;
Fig. 2 is the flow diagram of another valid data screening technique in the embodiment of the present invention;
Fig. 3 is the structural schematic diagram of one of embodiment of the present invention valid data screening plant.
Specific embodiment
Below in conjunction with the attached drawing in the embodiment of the present application, technical solutions in the embodiments of the present application carries out clear, complete
Site preparation description, it is clear that described embodiments are only a part of embodiments of the present application, instead of all the embodiments.It is based on
Embodiment in the application, it is obtained by those of ordinary skill in the art without making creative efforts every other
Embodiment shall fall in the protection scope of this application.Related directionality instruction in the embodiment of the present invention (such as upper and lower, left and right,
It is forward and backward etc.) it is only used for the relative positional relationship explained under a certain particular pose (as shown in the picture) between each component, movement feelings
Condition etc., if the particular pose changes, directionality instruction is also correspondingly changed correspondingly.
As stated in the background art, a kind of valid data screening technique in the prior art, be by with daily mean data
The data markers that absolute difference is greater than threshold value are valid data.But this method has the following problems:
(1) if data generate the small-sized fluctuation in very short time because of various situations, but it is certain before and after the small-sized fluctuation data
All then illustrate the research of the small-sized fluctuation data without medium-and-large-sized fluctuation or small-sized fluctuation that is eligible, being worth analysis in time
Meaning is lower, and the influence to final data analysis result is lower, is considered marginal data.If in the number of big data environment magnanimity
According under scale, the scale amount of marginal data is also very huge, can consume computing resource and the time of very big share.
(2) if in view of marginal data influence and increase the screenings of marginal data while ergodic data, can one
Determine the cost that extra storage resource in the consumption and computer of computing resource is increased in degree.
Therefore, valid data screening technique in the prior art the problem of there is accuracy rate and inefficiency.
To solve the above problems, the technical solution in the embodiment of the present invention passes through when determining current data is compared to previous
When data variation is excessive, data in front and back a period of time based on current data determine and record corresponding whole section of wave
The data fluctuations position of dynamic data, it is possible to determine that data fluctuations situation in special time period before and after data dynamically holds the data
Decision boundaries situation overall process, improving data precision as far as possible and in the case where final data analysis result precision, to the greatest extent
Overhead in terms of amount reduction data screening method bring computing resource itself and additional storage space, therefore can be improved
The accuracy and efficiency of valid data screening.
To make the above purposes, features and advantages of the invention more obvious and understandable, with reference to the accompanying drawing to the present invention
Specific embodiment be described in detail.
Thinking of the invention is while traversing every data in data set to be analyzed, before every data
Situation of change in a period of time afterwards carries out data character judgement to it.If it is determined that being valid data, then to it locating for it
Position in whole segment data fluctuation is determined.According to its present position difference, different types of valid data label behaviour is carried out
Make.Finally, extract before in data set all valid data different according to the label of valid data, and carry out at segmentation
Reason.
Fig. 1 shows the flow diagram of one of embodiment of the present invention valid data screening technique.Referring to Fig. 1, originally
One of inventive embodiments valid data screening technique, can specifically include following step:
Step S101: data set to be analyzed is obtained.
Step S102: traversing the data in acquired data set, obtains traversing current data extremely.
In specific implementation, the sequence traversed to the data in acquired data set can be according to actual analysis
It needs to carry out, herein with no restrictions.
Step S103: it is excessive to judge whether current data changes compared with previous data;When the judgment result is yes,
Step S104 can be executed;Conversely, can then execute step S106.
It in specific implementation, first can be with when judging whether current data changes excessive compared with previous data
The absolute difference between the current data and previous data is calculated, by the absolute difference being calculated and preset difference
Threshold value is compared, and it is excessive according to comparison result to determine whether current data changes compared with previous data.
Step S104: the data in front and back a period of time based on current data determine and record corresponding whole section of wave
The data fluctuations position of dynamic data.
It in specific implementation, can be based on current when the current data of determination is excessive compared to the variation of previous data
Data in front and back a period of time of data, to determine and record the data fluctuations position of corresponding whole section of fluctuation data, tool
Body refers to being discussed in detail for corresponding part in Fig. 2.
Step S106: judge whether the data set to be analyzed traverses completion;When the judgment result is no, it can execute
Step S107;Conversely, then can be with end operation.
Step S107: next data is obtained.
In specific implementation, when determining that the data set to be analyzed does not traverse completion, then available to work as preceding article number
According to next data as traversal current data extremely, and continue to execute since step S103, until described to be analyzed
Data set in all data all traversal complete.
Further details of introduction is carried out to the valid data screening technique in the embodiment of the present invention below in conjunction with Fig. 2.
Such as 2 show the flow diagram of one of embodiment of the present invention valid data screening technique.Referring to fig. 2, originally
Valid data screening technique in inventive embodiments, can specifically include following step:
Step S201: data set to be analyzed is obtained.
Step S202: traversing the data in acquired data set, obtains traversing current data extremely.
Step S203: it is excessive to judge whether current data changes compared with previous data;When the judgment result is yes,
Step S204 can be executed;Conversely, can then execute step S205.
Step S204: the count value of preset n of logger is increased by 1.
In specific implementation, the numerical value of n can be configured according to actual needs.
In an embodiment of the present invention, binary digit is chosen as logger.As for why choose binary digit as note
Device is recorded, is for consideration:
(1) in general algorithm, a number can only record an effective information.If but using the number of binary digit as record
Device can then possess the vast capacity of extremely low cost.It is illustrated with integer, a shaping number is 32, the flag bit of beginning is removed,
The situation of change of 31 digits can be then recorded in the so small memory space of capacity, cost performance is high.
(2) computer based hardware operation is considered, and the binary system n digit that one has recorded (n-1) digit is moved to left, phase
When in eliminating the influence of the position (n-2) data before current data, and remaining (n-2) data is changed
Information imparts different priority according to time and user's evaluation situation again.
(3) binary digit is the storage form of hardware in computer, equally possesses decimal system meaning.By binary digit
Logger is compared with the special evaluation of estimate obtained according to user's evaluation, according to judgement result it can be learnt that distance works as preceding article number
It whether is the marginal data fluctuated within the scope of a period of time of front and back without specific data according to the position (n-2) data.
Certainly, those skilled in the art can also use the logger of non-binary digit, herein with no restrictions.
Step S205: judge whether the current count value of the logger is greater than preset count threshold;Work as judging result
When to be, step S206 can be executed;Conversely, can then execute step S208.
In specific implementation, the preset count threshold can be configured according to the actual needs.For example, when record
When the digit n=32 of device, then count threshold is set as 28672, i.e. 7000 under Hexadecimal form.
Step S206: obtaining the information of stored data fluctuations position in preset dynamic array, and judges described dynamic
State array is whether data fluctuations position locating for the last one valid data for storing in the empty or described dynamic array is tail
Node;When the judgment result is yes, step S207 can be executed;Conversely, can then execute step S208.
In specific implementation, the dynamic array is for recording the data for having analyzed completion in the data set to be analyzed
In valid data position sequence and its whether be whole section fluctuation data tail node decision content label information.
Step S207: ((n-2) -1) data before determining current data is data fluctuations first node.
In specific implementation, when determining that the dynamic array is the last one stored in the empty or described dynamic array
When whether data fluctuations position locating for valid data is tail node, ((n-2) -1) before current data can be determined
Data is data fluctuations first node.At this point it is possible to be remembered using following valid data model to the information of the first node
It records and is stored in the dynamic array:
Wherein, NiIndicate the sequence number of the data fluctuations first node of the corresponding whole section of fluctuation data of current data, NcIt indicates
The sequence number of current data, MiIndicate the judgment value label of the data fluctuations tail node of corresponding whole section of fluctuation data.
Step S208: judge the sequence number of current data subtract (n-2) again plus (minimum consecutive numbers -1) numerical value whether
Close to data daily mean value;When the judgment result is yes, step S209 can be executed;Conversely, can then execute step S210.
In specific implementation, the minimum consecutive numbers is highest order of the count threshold in step S205 from removing sign bit
The number of continuous " 1 " of the past low level started;The data daily mean value is data of the system in daily no fluctuation
Regime values.
Step S209: determine that the sequence number of current data subtracts (n-2) again plus (minimum consecutive numbers -1) data is number
According to the tail node of fluctuation.
In specific implementation, the sequence number in current data subtracts (n-2) again plus the numerical value of (minimum consecutive numbers -1) connects
When nearly data daily mean value, the sequence number of data subtracts (n-2) again before can determining plus (minimum consecutive numbers -1) data is
The tail node of data fluctuations.At this point it is possible to be recorded described in information and the deposit of the tail node using following valid data model
In dynamic array:
Wherein, NiIndicate the sequence number of the data fluctuations tail node of the corresponding whole section of fluctuation data of current data, NcIt indicates
The sequence number of current data, MiIndicate the judgment value label of the data fluctuations tail node of corresponding whole section of fluctuation data, min
Con indicates the minimum consecutive numbers.
Step S210: (n-2) data before determining current data is the interim node of data fluctuations and records.
In specific implementation, when determining the last one non-to store in the empty or described dynamic array of the dynamic array
Non- data fluctuations position locating for valid data is tail node, and the sequence number of current data subtracts (n-2) again plus (minimum is even
Continuous number -1) the numerical value non-mean value close to data daily when, (n-2) data before determining current data is data fluctuations
Interim node.At this point it is possible to record the information of the interim node using following valid data model and be stored in the dynamic
Array:
Wherein, NiIndicate the sequence number of the interim node of data fluctuations of the corresponding whole section of fluctuation data of current data, NcTable
Show the sequence number of current data, MiIndicate the judgment value label of the data fluctuations tail node of corresponding whole section of fluctuation data.
What needs to be explained here is that if data are judged as valid data, for the ease of data analysis later, by these
Valid data record when carry out segmentation be very it is necessary to.If valid data have carried out staged operation, when data screening
It carries out just being not necessarily to record all valid data sequence numbers when valid data record, need to only record wherein three kinds of data fluctuations sections
Point --- first node, interim node and tail node, in which:
--- first node (First Node), the first data of whole segment data fluctuation show that data fluctuations originate.
--- interim node (Mid-term Node), the data for meeting specific condition in the fluctuation of whole segment data show to count
Change more violent data according to amplitude in fluctuation.
--- tail node (Last Node), the last item data of whole segment data fluctuation show that data fluctuations terminate.
Other than the reason of valid data are segmented, defining these three nodes, there are also another important reasons.The present invention
One of the characteristics of valid data screening technique in embodiment is can be changed according to data fluctuations in following a period of time
Situation " predicts " ability to determine whether current data is valid data.In the actual implementation process, this function is to pass through
First " skipping over " current data the case where according to these data, is returned again to the before after traversal (n-1) data
(n-2) data is judged.This way is the problem is that Data Position determines that there are a degree of " delays ".
According to research and experiment test, this " delay " can be offset.Valid data in embodiments of the present invention
In model, if the currently active data are judged as first node, need to subtract before (n-2) obtain in the sequence number of current data
On the basis of the sequence number of (n-2) data, then subtract 1, obtains the data previous again that data start fluctuating change, this data
It is just real first node, i.e., the first data of whole segment data fluctuation;If the currently active data are judged as tail node, need
It will be on the basis of the sequence number of current data subtracts the sequence number of (n-2) data before (n-2) is obtained, before adding
The minimum consecutive numbers defined postpones the sequence number of valid data one section backward, is reached with offsetting one section of valid data fluctuation
There is no the ends of fluctuation later, and cause to judge that data expire bring " delay " in advance, subtract 1 again later, are counted
According to the minimum consecutive numbers data that subtracts 1 started after fluctuating change, this data is just the real tail after counteracting " delay "
Node, i.e., the last item data of whole segment data fluctuation, while being marked for tail node, the segmentation after being is laid the groundwork;If
Current data is still valid data, then illustrates in the case where being both judged as non-first node or being judged as non-tail node
This data is the biggish data of amplitude of variation in whole segment data fluctuation, and " delay " phenomenon is not present in such case, so only needing
The sequence number of current data is subtracted to the sequence number of (n-2) data before (n-2) is obtained.
Step S211: the count value of the logger is moved to left X and sign bit is reset.
In specific implementation, can according to the actual needs, the weight situation of the extremely close data for current data of such as adjusting the distance
Determine etc., logger is moved to left X by selection, and the sign bit of logger is reset.Wherein, X=2M, M is integer.In the present invention
In one embodiment, X 1.
Step S212: judge whether the data set to be analyzed traverses and finish;When the judgment result is no, it can execute
Step S213;Conversely, then can be with end operation.
In specific implementation, when determining that the data set traversal to be analyzed finishes, then can direct end operation, obtain
To the information of the valid data in the data set to be analyzed.
Step S213: next data is obtained.
In specific implementation, when determine the data set to be analyzed do not traverse finish when, then can obtain in sequence
Current data of next data as traversal extremely, and executed since step S203, until the data set to be analyzed
All traversal is completed.
Above scheme in the embodiment of the present invention, has the advantages that
(1) data screening result is accurate and reliable.The association meaning of mass data is identified, the invention proposes complete
Judgement system based on binary digit logger.In order to concentrate lookup to be hidden in the marginal data in mass data, improve according to this
The accuracy and precision for the data analysis result that data set obtains introduce number of bits as logger, and statistics needs to sentence
Fixed number dynamically holds the overall process of the Decision boundaries situation of the data according to data fluctuations situation in the special time period of front and back.
(2) it is had both in high precision with low consumption.Binary digit logger used in the present invention is able to record broad range of data
While information, minimum computer hardware memory space is only taken up, therefore rejects the marginal data minimum to influence data precision
While, the consumption of computing resource and additional storage space is also minimum.
It (3) can be among other data analysis algorithms.All calculating of this algorithm are entirely located in the single of ergodic data
Data processing, it means that the algorithm can be done directly on inside other data screenings or data analysis algorithm, with other institutes
There is algorithm collocation, it is complementary with other algorithm superiority and inferiority, that is, it is positioned as complementary algorithm.
The embodiment of the invention also provides a kind of computer readable storage mediums, are stored thereon with computer instruction, described
The step of valid data screening technique is executed when computer instruction is run.Wherein, the valid data screening technique is asked
Referring to the introduction of preceding sections, details are not described herein.
The embodiment of the invention also provides a kind of terminal, including memory and processor, energy is stored on the memory
Enough computer instructions run on the processor, it is effective described in execution when the processor runs the computer instruction
The step of data screening method.
The above-mentioned valid data screening technique in present invention implementation is described in detail, below will be to above-mentioned side
The corresponding device of method is described.
Fig. 3 shows the structural schematic diagram of one of embodiment of the present invention valid data screening plant.Referring to Fig. 3, one
Kind valid data screening plant 30, may include acquiring unit 301 and screening unit 302, in which:
Acquiring unit 301, suitable for obtaining data set to be analyzed.
Screening unit 302, suitable for being traversed to the data in acquired data set, obtain traversing extremely when preceding article number
According to;It is excessive to judge whether current data changes compared with previous data;When the current data of determination is compared to previous item number
When according to changing excessive, then the data in front and back a period of time based on current data, determine and record corresponding whole section of fluctuation
The data fluctuations position of data;Next data is obtained, until the data set to be analyzed all complete by traversal.
In an embodiment of the present invention, the screening unit 302 is suitable for calculating the current data and previous data
Between absolute difference, and by the way that the absolute difference being calculated to be compared with preset difference threshold, currently with judgement
It is excessive whether data changes compared with previous data.
Optionally, the screening unit 302 is suitable for changing when determining current data compared to previous data excessive
When, the count value of preset n of logger is increased into preset numerical value;Judge whether the current count value of the logger is big
In preset count threshold;When the current count value for determining the logger is greater than preset count threshold, obtain preset
The information of stored the last item data fluctuations position in dynamic array;When determining that the dynamic array is empty or described dynamic
When data fluctuations position locating for the last one valid data stored in state array is tail node, before determining current data
((n-2) -1) data be data fluctuations first node;Add again (most when the sequence number of the current data of determination subtracts (n-2)
Small consecutive numbers -1) numerical value close to data daily mean value when, it is again plus (minimum to determine that the sequence number of current data subtracts (n-2)
Consecutive numbers -1) data be data fluctuations tail node;When determining that the dynamic array is non-in the empty or described dynamic array
Non- data fluctuations position locating for the last one valid data of storage is tail node, and the sequence number of current data subtracts
(n-2) again plus when the non-mean value close to data daily of numerical value of (minimum consecutive numbers -1), (n-2) before current data is determined
Data is the interim node of data fluctuations;The logger is moved to left X, and the sign bit of the logger is reset;X is
Integer more than or equal to 1 and less than n.
In an embodiment of the present invention, the screening unit 302, suitable for the ((n- before determining current data
2) when -1) data is data fluctuations first node, the data of the corresponding whole section of fluctuation data are recorded using following array
Undulation location:
Wherein, NiIndicate the sequence number of the data fluctuations first node of the corresponding whole section of fluctuation data of current data, NcIt indicates
The sequence number of current data, MiIndicate the judgment value label of the data fluctuations tail node of corresponding whole section of fluctuation data.
In an embodiment of the present invention, the screening unit 302, suitable for (n-2) before determining current data
When data is the interim node of data fluctuations, the data wave of the corresponding whole section of fluctuation data is recorded using following array
Dynamic position:
Wherein, NiIndicate the sequence number of the interim node of data fluctuations of the corresponding whole section of fluctuation data of current data, NcTable
Show the sequence number of current data, MiIndicate the judgment value label of the data fluctuations tail node of corresponding whole section of fluctuation data.
In an embodiment of the present invention, the screening unit 302, suitable for being subtracted when the sequence number for determining current data
(n-2) (minimum consecutive numbers -1) data is added constantly, to record the correspondence using following array for data fluctuations tail node again
Whole section fluctuation data data fluctuations position:
Wherein, NiIndicate the sequence number of the data fluctuations tail node of the corresponding whole section of fluctuation data of current data, NcIt indicates
The sequence number of current data, MiIndicate the judgment value label of the data fluctuations tail node of corresponding whole section of fluctuation data.
Above-mentioned scheme in the embodiment of the present invention is obtained by traversing to the data in acquired data set
The current data of traversal extremely;When the current data of determination is excessive compared to the variation of previous data, current data is determined
For valid data;When the current data of determination is valid data, it is determined that and record the data of corresponding whole section of fluctuation data
Undulation location;Next data is obtained, until the data set to be analyzed all complete by traversal, it is possible to determine that special before and after data
Data fluctuations situation in section of fixing time dynamically holds the overall process of the Decision boundaries situation of the data, is improving data as far as possible
In the case where precision and final data analysis result precision, reduce to the greatest extent data screening method bring computing resource itself and
Overhead in terms of additional storage space, therefore the accuracy and efficiency of valid data screening can be improved.
The basic principles, main features and advantages of the present invention have been shown and described above.The technology of the industry
Personnel are it should be appreciated that the present invention is not limited to the above embodiments, and the above embodiments and description only describe this
The principle of invention, without departing from the spirit and scope of the present invention, various changes and improvements may be made to the invention, the present invention
Claimed range is delineated by the appended claims, the specification and equivalents thereof from the appended claims.
Claims (8)
1. a kind of valid data screening plant characterized by comprising
Acquiring unit, suitable for obtaining data set to be analyzed;
Screening unit obtains traversing current data extremely suitable for traversing to the data in acquired data set;Judgement
It is excessive whether current data changes compared with previous data;When the current data of determination changed compared to previous data
When big, then the data in front and back a period of time based on current data, determine and record the number of corresponding whole section of fluctuation data
According to undulation location;Next data is obtained, until the data set to be analyzed all complete by traversal.
2. valid data screening plant according to claim 1, which is characterized in that the screening unit is suitable for calculating institute
The absolute difference between current data and previous data is stated, and passes through the absolute difference that will be calculated and preset difference
Whether threshold value is compared, changed with to judge current data compared with previous data excessive.
3. valid data screening plant according to claim 1, which is characterized in that the screening unit is suitable for when determination
When current data is excessive compared to the variation of previous data, the count value of preset n of logger is increased into preset number
Value;Judge whether the current count value of the logger is greater than preset count threshold;When the current meter for determining the logger
When numerical value is greater than preset count threshold, the letter of stored the last item data fluctuations position in preset dynamic array is obtained
Breath;When determining that the dynamic array is data wave locating for the last one valid data for storing in the empty or described dynamic array
When dynamic position is tail node, ((n-2) -1) data before determining current data is data fluctuations first node;Work as determination
The sequence number of current data subtracts (n-2) again plus when the numerical value of (minimum consecutive numbers -1) is close to data daily mean value, and determination is worked as
The sequence number of preceding data subtracts (n-2) and (minimum consecutive numbers -1) data is added to be the tail node of data fluctuations again;When determining
State dynamic array it is non-for data fluctuations position locating for the last one valid data for being stored in the empty or described dynamic array it is non-
Sequence number for tail node, and current data subtracts (n-2) again plus the numerical value of (minimum consecutive numbers -1) is non-close to data daily
When mean value, (n-2) data before determining current data is the interim node of data fluctuations;The logger is moved to left
X, and the sign bit of the logger is reset;X is the integer more than or equal to 1 and less than n.
4. valid data screening plant according to claim 3, which is characterized in that the screening unit is suitable for when determination
When ((n-2) -1) data before current data is data fluctuations first node, it is described right to be recorded using following array
The data fluctuations position for the whole section of fluctuation data answered:
Wherein, NiIndicate the sequence number of the data fluctuations first node of the corresponding whole section of fluctuation data of current data, NcIndicate current
The sequence number of data, MiIndicate the judgment value label of the data fluctuations tail node of corresponding whole section of fluctuation data.
5. valid data screening plant according to claim 3, which is characterized in that the screening unit is suitable for when determination
When (n-2) data before current data is the interim node of data fluctuations, it is described right to be recorded using following array
The data fluctuations position for the whole section of fluctuation data answered:
Wherein, NiIndicate the sequence number of the interim node of data fluctuations of the corresponding whole section of fluctuation data of current data, NcExpression is worked as
The sequence number of preceding data, MiIndicate the judgment value label of the data fluctuations tail node of corresponding whole section of fluctuation data.
6. valid data screening plant according to claim 3, which is characterized in that the screening unit is suitable for when true
When the sequence number of settled preceding data subtracts the tail node that (n-2) adds (minimum consecutive numbers -1) data for data fluctuations again, adopt
The data fluctuations position of the corresponding whole section of fluctuation data is recorded with following array:
Wherein, NiIndicate the sequence number of the data fluctuations tail node of the corresponding whole section of fluctuation data of current data, NcIndicate current
The sequence number of data, MiIndicate the judgment value label of the data fluctuations tail node of corresponding whole section of fluctuation data.
7. valid data screening plant according to claim 3, which is characterized in that X=1.
8. valid data screening plant according to claim 3, which is characterized in that n=32.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811247432.7A CN109522300B (en) | 2018-10-24 | 2018-10-24 | Effective data screening device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811247432.7A CN109522300B (en) | 2018-10-24 | 2018-10-24 | Effective data screening device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109522300A true CN109522300A (en) | 2019-03-26 |
CN109522300B CN109522300B (en) | 2021-09-28 |
Family
ID=65773748
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811247432.7A Active CN109522300B (en) | 2018-10-24 | 2018-10-24 | Effective data screening device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109522300B (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104050268A (en) * | 2014-06-23 | 2014-09-17 | 西北工业大学 | Continuous data protection and recovery method with log space adjustable online |
CN105354208A (en) * | 2015-09-21 | 2016-02-24 | 江苏讯狐信息科技有限公司 | Big data information mining method |
CN106162698A (en) * | 2015-04-15 | 2016-11-23 | 中国电信股份有限公司 | Network Abnormal problem analysis method and device |
US20180075006A1 (en) * | 2016-09-12 | 2018-03-15 | DataRails LTD. | System and method for logical identification of differences between spreadsheets |
-
2018
- 2018-10-24 CN CN201811247432.7A patent/CN109522300B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104050268A (en) * | 2014-06-23 | 2014-09-17 | 西北工业大学 | Continuous data protection and recovery method with log space adjustable online |
CN106162698A (en) * | 2015-04-15 | 2016-11-23 | 中国电信股份有限公司 | Network Abnormal problem analysis method and device |
CN105354208A (en) * | 2015-09-21 | 2016-02-24 | 江苏讯狐信息科技有限公司 | Big data information mining method |
US20180075006A1 (en) * | 2016-09-12 | 2018-03-15 | DataRails LTD. | System and method for logical identification of differences between spreadsheets |
Non-Patent Citations (3)
Title |
---|
ERZHUOCHE,MICHAEL J.OLSEN: "Fast ground filtering for TLS data via Scanline Density Analysis", 《ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING》 * |
余浩: "基于大数据的数据存储及数据筛选问题研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
毕朝国,徐小龙: "一种云存储系统中重复数据删除机制", 《计算机应用研究》 * |
Also Published As
Publication number | Publication date |
---|---|
CN109522300B (en) | 2021-09-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Jackson et al. | An algorithm for optimal partitioning of data on an interval | |
Wong et al. | An efficient method for weighted sampling without replacement | |
DeZeeuw et al. | An adaptively refined Cartesian mesh solver for the Euler equations | |
US10366095B2 (en) | Processing time series | |
US9052748B2 (en) | System and method for inputting text into electronic devices | |
US10133549B1 (en) | Systems and methods for implementing a synchronous FIFO with registered outputs | |
CN109522382A (en) | Spatial data gridding statistical method and device | |
CN110428139A (en) | The information forecasting method and device propagated based on label | |
CN111522968A (en) | Knowledge graph fusion method and device | |
CN104679720A (en) | Operation method for FFT | |
Brackbill | Coordinate system control: adaptive meshes | |
CN110245155A (en) | Data processing method, device, computer readable storage medium and terminal device | |
Pettersson et al. | Adaptive stratified sampling for nonsmooth problems | |
CN112825199B (en) | Collision detection method, device, equipment and storage medium | |
CN109542927A (en) | Valid data screening technique, readable storage medium storing program for executing and terminal | |
Bražėnas et al. | Parallel algorithms for fitting Markov arrival processes | |
CN107506388A (en) | A kind of iterative data balancing optimization method towards Spark parallel computation frames | |
CN109522300A (en) | Valid data screening plant | |
CN104516823B (en) | A kind of date storage method and device | |
CN108170837A (en) | Method of Data Discretization, device, computer equipment and storage medium | |
CN110020954B (en) | Revenue distribution method and device and computer equipment | |
CN108011735B (en) | Community discovery method and device | |
Hubalek et al. | A multivariate view of random bucket digital search trees | |
Schweitzer et al. | Buffer overflow calculations using an infinite-capacity model | |
CN113221862A (en) | Data filtering method and device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |