CN109522300A - Valid data screening plant - Google Patents

Valid data screening plant Download PDF

Info

Publication number
CN109522300A
CN109522300A CN201811247432.7A CN201811247432A CN109522300A CN 109522300 A CN109522300 A CN 109522300A CN 201811247432 A CN201811247432 A CN 201811247432A CN 109522300 A CN109522300 A CN 109522300A
Authority
CN
China
Prior art keywords
data
fluctuations
current
fluctuation
sequence number
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811247432.7A
Other languages
Chinese (zh)
Other versions
CN109522300B (en
Inventor
徐小龙
林皓伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Post and Telecommunication University
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing Post and Telecommunication University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Post and Telecommunication University filed Critical Nanjing Post and Telecommunication University
Priority to CN201811247432.7A priority Critical patent/CN109522300B/en
Publication of CN109522300A publication Critical patent/CN109522300A/en
Application granted granted Critical
Publication of CN109522300B publication Critical patent/CN109522300B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

A kind of valid data screening plant, described device include: acquiring unit, suitable for obtaining data set to be analyzed;Screening unit obtains traversing current data extremely suitable for traversing to the data in acquired data set;It is excessive to judge whether current data changes compared with previous data;When the current data of determination is excessive compared to the variation of previous data, then the data in front and back a period of time based on current data, determine and record the data fluctuations position of corresponding whole section of fluctuation data;Next data is obtained, until the data set to be analyzed all complete by traversal.The efficiency and accuracy of valid data screening can be improved in above-mentioned scheme.

Description

Valid data screening plant
Technical field
The invention belongs to data analysis technique fields, more particularly to a kind of valid data screening plant.
Background technique
Since 2012, " big data " word started to frequently enter the visual field of people, is widely received and is studied. The data that these scales go from strength to strength, all under cover huge potential values, decides numerous enterprises and every field not behind Come the direction developed and achievement.The data bring for having had more and more Enterprise Consciousness to these explosive growths now is hidden Suffer from, starts gradually to pay attention to mass data to the importance of enterprise.Believe although big data has brought endlessly business Breath and social value, but wherein the problem of be also apparent from --- the data volume under current era is excessively huge.
Excessively huge data volume under big data environment results in therefrom analyze effective information, then needs to disappear A large amount of resource and time are consumed, and daily mean data therein and marginal data all occupy very big specific gravity.To reduce this A little resources for calculating consumption and time can also be from its data of reduction other than designing more outstanding data analysis algorithm The angle of scale is started with.
Summary of the invention
The technical problem to be solved by the present invention is to how improve the efficiency and accuracy of valid data screening.
In order to achieve the above object, the embodiment of the invention provides a kind of valid data screening plant, described device includes:
Acquiring unit, suitable for obtaining data set to be analyzed;
Screening unit obtains traversing current data extremely suitable for traversing to the data in acquired data set; It is excessive to judge whether current data changes compared with previous data;When the current data of determination becomes compared to previous data When changing excessive, then the data in front and back a period of time based on current data, determine and record corresponding whole section of fluctuation data Data fluctuations position;Next data is obtained, until the data set to be analyzed all complete by traversal.
Optionally, the screening unit, suitable for calculating the absolute difference between the current data and previous data, And by the way that the absolute difference being calculated to be compared with preset difference threshold, to judge current data and previous item number According to compared to whether change it is excessive.
Optionally, the screening unit is suitable for when the current data of determination is excessive compared to the variation of previous data, will The count value of preset n of logger increases preset numerical value;It is pre- to judge whether the current count value of the logger is greater than If count threshold;When the current count value for determining the logger is greater than preset count threshold, preset dynamic is obtained The information of stored the last item data fluctuations position in array;When determining the dynamic array for the empty or described dynamic number When data fluctuations position locating for the last one valid data stored in group is tail node, the before current data is determined ((n-2) -1) data is data fluctuations first node;Add (minimum company again when the sequence number of the current data of determination subtracts (n-2) Continuous number -1) numerical value close to data daily mean value when, it is again plus (minimum continuous to determine that the sequence number of current data subtracts (n-2) Several -1) data is the tail node of data fluctuations;When determining that the dynamic array is non-to store in the empty or described dynamic array The last one valid data locating for data fluctuations position it is non-be tail node, and the sequence number of current data subtracts (n-2) Again plus when the non-mean value close to data daily of the numerical value of (minimum consecutive numbers -1), (n-2) article number before current data is determined According to the interim node for data fluctuations;The logger is moved to left X, and the sign bit of the logger is reset;X be greater than Or the integer equal to 1 and less than n.
Optionally, the screening unit is data suitable for ((n-2) -1) data before determining current data When fluctuating first node, the data fluctuations position of the corresponding whole section of fluctuation data is recorded using following array:
Wherein, NiIndicate the sequence number of the data fluctuations first node of the corresponding whole section of fluctuation data of current data, NcIt indicates The sequence number of current data, MiIndicate the judgment value label of the data fluctuations tail node of corresponding whole section of fluctuation data.
Optionally, the screening unit is data fluctuations suitable for (n-2) data before determining current data Interim node when, using following array record it is described it is corresponding whole section fluctuation data data fluctuations position:
Wherein, NiIndicate the sequence number of the interim node of data fluctuations of the corresponding whole section of fluctuation data of current data, NcTable Show the sequence number of current data, MiIndicate the judgment value label of the data fluctuations tail node of corresponding whole section of fluctuation data.
Optionally, the screening unit, suitable for adding (minimum again when the sequence number for determining current data subtracts (n-2) Consecutive numbers -1) data when being data fluctuations tail node, records the corresponding whole section of fluctuation data using following array Data fluctuations position:
Wherein, NiIndicate the sequence number of the data fluctuations tail node of the corresponding whole section of fluctuation data of current data, NcIt indicates The sequence number of current data, MiIndicate the judgment value label of the data fluctuations tail node of corresponding whole section of fluctuation data.
Optionally .X=1.
Optionally, n=32.
Compared with prior art, the invention has the benefit that
Above-mentioned scheme, by being traversed to the data in acquired data set, obtain traversing extremely when preceding article number According to;When the current data of determination is excessive compared to the variation of previous data, then the front and back based on current data is for a period of time Interior data determine and record the data fluctuations position of corresponding whole section of fluctuation data;Obtain next data, until it is described to The data set of analysis all complete by traversal, it is possible to determine that data fluctuations situation in special time period before and after data, dynamic is held should The overall process of the Decision boundaries situation of data, the case where improving data precision and final data analysis result precision as far as possible Under, reduce the overhead in terms of data screening method bring computing resource itself and additional storage space to the greatest extent, therefore can To improve the accuracy and efficiency of valid data screening.
Detailed description of the invention
In order to more clearly explain the technical solutions in the embodiments of the present application, make required in being described below to embodiment Attached drawing is briefly described, it should be apparent that, the drawings in the following description are only some examples of the present application, for For those of ordinary skill in the art, without any creative labor, it can also be obtained according to these attached drawings His attached drawing.
Fig. 1 is the flow diagram of one of embodiment of the present invention valid data screening technique;
Fig. 2 is the flow diagram of another valid data screening technique in the embodiment of the present invention;
Fig. 3 is the structural schematic diagram of one of embodiment of the present invention valid data screening plant.
Specific embodiment
Below in conjunction with the attached drawing in the embodiment of the present application, technical solutions in the embodiments of the present application carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of embodiments of the present application, instead of all the embodiments.It is based on Embodiment in the application, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall in the protection scope of this application.Related directionality instruction in the embodiment of the present invention (such as upper and lower, left and right, It is forward and backward etc.) it is only used for the relative positional relationship explained under a certain particular pose (as shown in the picture) between each component, movement feelings Condition etc., if the particular pose changes, directionality instruction is also correspondingly changed correspondingly.
As stated in the background art, a kind of valid data screening technique in the prior art, be by with daily mean data The data markers that absolute difference is greater than threshold value are valid data.But this method has the following problems:
(1) if data generate the small-sized fluctuation in very short time because of various situations, but it is certain before and after the small-sized fluctuation data All then illustrate the research of the small-sized fluctuation data without medium-and-large-sized fluctuation or small-sized fluctuation that is eligible, being worth analysis in time Meaning is lower, and the influence to final data analysis result is lower, is considered marginal data.If in the number of big data environment magnanimity According under scale, the scale amount of marginal data is also very huge, can consume computing resource and the time of very big share.
(2) if in view of marginal data influence and increase the screenings of marginal data while ergodic data, can one Determine the cost that extra storage resource in the consumption and computer of computing resource is increased in degree.
Therefore, valid data screening technique in the prior art the problem of there is accuracy rate and inefficiency.
To solve the above problems, the technical solution in the embodiment of the present invention passes through when determining current data is compared to previous When data variation is excessive, data in front and back a period of time based on current data determine and record corresponding whole section of wave The data fluctuations position of dynamic data, it is possible to determine that data fluctuations situation in special time period before and after data dynamically holds the data Decision boundaries situation overall process, improving data precision as far as possible and in the case where final data analysis result precision, to the greatest extent Overhead in terms of amount reduction data screening method bring computing resource itself and additional storage space, therefore can be improved The accuracy and efficiency of valid data screening.
To make the above purposes, features and advantages of the invention more obvious and understandable, with reference to the accompanying drawing to the present invention Specific embodiment be described in detail.
Thinking of the invention is while traversing every data in data set to be analyzed, before every data Situation of change in a period of time afterwards carries out data character judgement to it.If it is determined that being valid data, then to it locating for it Position in whole segment data fluctuation is determined.According to its present position difference, different types of valid data label behaviour is carried out Make.Finally, extract before in data set all valid data different according to the label of valid data, and carry out at segmentation Reason.
Fig. 1 shows the flow diagram of one of embodiment of the present invention valid data screening technique.Referring to Fig. 1, originally One of inventive embodiments valid data screening technique, can specifically include following step:
Step S101: data set to be analyzed is obtained.
Step S102: traversing the data in acquired data set, obtains traversing current data extremely.
In specific implementation, the sequence traversed to the data in acquired data set can be according to actual analysis It needs to carry out, herein with no restrictions.
Step S103: it is excessive to judge whether current data changes compared with previous data;When the judgment result is yes, Step S104 can be executed;Conversely, can then execute step S106.
It in specific implementation, first can be with when judging whether current data changes excessive compared with previous data The absolute difference between the current data and previous data is calculated, by the absolute difference being calculated and preset difference Threshold value is compared, and it is excessive according to comparison result to determine whether current data changes compared with previous data.
Step S104: the data in front and back a period of time based on current data determine and record corresponding whole section of wave The data fluctuations position of dynamic data.
It in specific implementation, can be based on current when the current data of determination is excessive compared to the variation of previous data Data in front and back a period of time of data, to determine and record the data fluctuations position of corresponding whole section of fluctuation data, tool Body refers to being discussed in detail for corresponding part in Fig. 2.
Step S106: judge whether the data set to be analyzed traverses completion;When the judgment result is no, it can execute Step S107;Conversely, then can be with end operation.
Step S107: next data is obtained.
In specific implementation, when determining that the data set to be analyzed does not traverse completion, then available to work as preceding article number According to next data as traversal current data extremely, and continue to execute since step S103, until described to be analyzed Data set in all data all traversal complete.
Further details of introduction is carried out to the valid data screening technique in the embodiment of the present invention below in conjunction with Fig. 2.
Such as 2 show the flow diagram of one of embodiment of the present invention valid data screening technique.Referring to fig. 2, originally Valid data screening technique in inventive embodiments, can specifically include following step:
Step S201: data set to be analyzed is obtained.
Step S202: traversing the data in acquired data set, obtains traversing current data extremely.
Step S203: it is excessive to judge whether current data changes compared with previous data;When the judgment result is yes, Step S204 can be executed;Conversely, can then execute step S205.
Step S204: the count value of preset n of logger is increased by 1.
In specific implementation, the numerical value of n can be configured according to actual needs.
In an embodiment of the present invention, binary digit is chosen as logger.As for why choose binary digit as note Device is recorded, is for consideration:
(1) in general algorithm, a number can only record an effective information.If but using the number of binary digit as record Device can then possess the vast capacity of extremely low cost.It is illustrated with integer, a shaping number is 32, the flag bit of beginning is removed, The situation of change of 31 digits can be then recorded in the so small memory space of capacity, cost performance is high.
(2) computer based hardware operation is considered, and the binary system n digit that one has recorded (n-1) digit is moved to left, phase When in eliminating the influence of the position (n-2) data before current data, and remaining (n-2) data is changed Information imparts different priority according to time and user's evaluation situation again.
(3) binary digit is the storage form of hardware in computer, equally possesses decimal system meaning.By binary digit Logger is compared with the special evaluation of estimate obtained according to user's evaluation, according to judgement result it can be learnt that distance works as preceding article number It whether is the marginal data fluctuated within the scope of a period of time of front and back without specific data according to the position (n-2) data.
Certainly, those skilled in the art can also use the logger of non-binary digit, herein with no restrictions.
Step S205: judge whether the current count value of the logger is greater than preset count threshold;Work as judging result When to be, step S206 can be executed;Conversely, can then execute step S208.
In specific implementation, the preset count threshold can be configured according to the actual needs.For example, when record When the digit n=32 of device, then count threshold is set as 28672, i.e. 7000 under Hexadecimal form.
Step S206: obtaining the information of stored data fluctuations position in preset dynamic array, and judges described dynamic State array is whether data fluctuations position locating for the last one valid data for storing in the empty or described dynamic array is tail Node;When the judgment result is yes, step S207 can be executed;Conversely, can then execute step S208.
In specific implementation, the dynamic array is for recording the data for having analyzed completion in the data set to be analyzed In valid data position sequence and its whether be whole section fluctuation data tail node decision content label information.
Step S207: ((n-2) -1) data before determining current data is data fluctuations first node.
In specific implementation, when determining that the dynamic array is the last one stored in the empty or described dynamic array When whether data fluctuations position locating for valid data is tail node, ((n-2) -1) before current data can be determined Data is data fluctuations first node.At this point it is possible to be remembered using following valid data model to the information of the first node It records and is stored in the dynamic array:
Wherein, NiIndicate the sequence number of the data fluctuations first node of the corresponding whole section of fluctuation data of current data, NcIt indicates The sequence number of current data, MiIndicate the judgment value label of the data fluctuations tail node of corresponding whole section of fluctuation data.
Step S208: judge the sequence number of current data subtract (n-2) again plus (minimum consecutive numbers -1) numerical value whether Close to data daily mean value;When the judgment result is yes, step S209 can be executed;Conversely, can then execute step S210.
In specific implementation, the minimum consecutive numbers is highest order of the count threshold in step S205 from removing sign bit The number of continuous " 1 " of the past low level started;The data daily mean value is data of the system in daily no fluctuation Regime values.
Step S209: determine that the sequence number of current data subtracts (n-2) again plus (minimum consecutive numbers -1) data is number According to the tail node of fluctuation.
In specific implementation, the sequence number in current data subtracts (n-2) again plus the numerical value of (minimum consecutive numbers -1) connects When nearly data daily mean value, the sequence number of data subtracts (n-2) again before can determining plus (minimum consecutive numbers -1) data is The tail node of data fluctuations.At this point it is possible to be recorded described in information and the deposit of the tail node using following valid data model In dynamic array:
Wherein, NiIndicate the sequence number of the data fluctuations tail node of the corresponding whole section of fluctuation data of current data, NcIt indicates The sequence number of current data, MiIndicate the judgment value label of the data fluctuations tail node of corresponding whole section of fluctuation data, min Con indicates the minimum consecutive numbers.
Step S210: (n-2) data before determining current data is the interim node of data fluctuations and records.
In specific implementation, when determining the last one non-to store in the empty or described dynamic array of the dynamic array Non- data fluctuations position locating for valid data is tail node, and the sequence number of current data subtracts (n-2) again plus (minimum is even Continuous number -1) the numerical value non-mean value close to data daily when, (n-2) data before determining current data is data fluctuations Interim node.At this point it is possible to record the information of the interim node using following valid data model and be stored in the dynamic Array:
Wherein, NiIndicate the sequence number of the interim node of data fluctuations of the corresponding whole section of fluctuation data of current data, NcTable Show the sequence number of current data, MiIndicate the judgment value label of the data fluctuations tail node of corresponding whole section of fluctuation data.
What needs to be explained here is that if data are judged as valid data, for the ease of data analysis later, by these Valid data record when carry out segmentation be very it is necessary to.If valid data have carried out staged operation, when data screening It carries out just being not necessarily to record all valid data sequence numbers when valid data record, need to only record wherein three kinds of data fluctuations sections Point --- first node, interim node and tail node, in which:
--- first node (First Node), the first data of whole segment data fluctuation show that data fluctuations originate.
--- interim node (Mid-term Node), the data for meeting specific condition in the fluctuation of whole segment data show to count Change more violent data according to amplitude in fluctuation.
--- tail node (Last Node), the last item data of whole segment data fluctuation show that data fluctuations terminate.
Other than the reason of valid data are segmented, defining these three nodes, there are also another important reasons.The present invention One of the characteristics of valid data screening technique in embodiment is can be changed according to data fluctuations in following a period of time Situation " predicts " ability to determine whether current data is valid data.In the actual implementation process, this function is to pass through First " skipping over " current data the case where according to these data, is returned again to the before after traversal (n-1) data (n-2) data is judged.This way is the problem is that Data Position determines that there are a degree of " delays ".
According to research and experiment test, this " delay " can be offset.Valid data in embodiments of the present invention In model, if the currently active data are judged as first node, need to subtract before (n-2) obtain in the sequence number of current data On the basis of the sequence number of (n-2) data, then subtract 1, obtains the data previous again that data start fluctuating change, this data It is just real first node, i.e., the first data of whole segment data fluctuation;If the currently active data are judged as tail node, need It will be on the basis of the sequence number of current data subtracts the sequence number of (n-2) data before (n-2) is obtained, before adding The minimum consecutive numbers defined postpones the sequence number of valid data one section backward, is reached with offsetting one section of valid data fluctuation There is no the ends of fluctuation later, and cause to judge that data expire bring " delay " in advance, subtract 1 again later, are counted According to the minimum consecutive numbers data that subtracts 1 started after fluctuating change, this data is just the real tail after counteracting " delay " Node, i.e., the last item data of whole segment data fluctuation, while being marked for tail node, the segmentation after being is laid the groundwork;If Current data is still valid data, then illustrates in the case where being both judged as non-first node or being judged as non-tail node This data is the biggish data of amplitude of variation in whole segment data fluctuation, and " delay " phenomenon is not present in such case, so only needing The sequence number of current data is subtracted to the sequence number of (n-2) data before (n-2) is obtained.
Step S211: the count value of the logger is moved to left X and sign bit is reset.
In specific implementation, can according to the actual needs, the weight situation of the extremely close data for current data of such as adjusting the distance Determine etc., logger is moved to left X by selection, and the sign bit of logger is reset.Wherein, X=2M, M is integer.In the present invention In one embodiment, X 1.
Step S212: judge whether the data set to be analyzed traverses and finish;When the judgment result is no, it can execute Step S213;Conversely, then can be with end operation.
In specific implementation, when determining that the data set traversal to be analyzed finishes, then can direct end operation, obtain To the information of the valid data in the data set to be analyzed.
Step S213: next data is obtained.
In specific implementation, when determine the data set to be analyzed do not traverse finish when, then can obtain in sequence Current data of next data as traversal extremely, and executed since step S203, until the data set to be analyzed All traversal is completed.
Above scheme in the embodiment of the present invention, has the advantages that
(1) data screening result is accurate and reliable.The association meaning of mass data is identified, the invention proposes complete Judgement system based on binary digit logger.In order to concentrate lookup to be hidden in the marginal data in mass data, improve according to this The accuracy and precision for the data analysis result that data set obtains introduce number of bits as logger, and statistics needs to sentence Fixed number dynamically holds the overall process of the Decision boundaries situation of the data according to data fluctuations situation in the special time period of front and back.
(2) it is had both in high precision with low consumption.Binary digit logger used in the present invention is able to record broad range of data While information, minimum computer hardware memory space is only taken up, therefore rejects the marginal data minimum to influence data precision While, the consumption of computing resource and additional storage space is also minimum.
It (3) can be among other data analysis algorithms.All calculating of this algorithm are entirely located in the single of ergodic data Data processing, it means that the algorithm can be done directly on inside other data screenings or data analysis algorithm, with other institutes There is algorithm collocation, it is complementary with other algorithm superiority and inferiority, that is, it is positioned as complementary algorithm.
The embodiment of the invention also provides a kind of computer readable storage mediums, are stored thereon with computer instruction, described The step of valid data screening technique is executed when computer instruction is run.Wherein, the valid data screening technique is asked Referring to the introduction of preceding sections, details are not described herein.
The embodiment of the invention also provides a kind of terminal, including memory and processor, energy is stored on the memory Enough computer instructions run on the processor, it is effective described in execution when the processor runs the computer instruction The step of data screening method.
The above-mentioned valid data screening technique in present invention implementation is described in detail, below will be to above-mentioned side The corresponding device of method is described.
Fig. 3 shows the structural schematic diagram of one of embodiment of the present invention valid data screening plant.Referring to Fig. 3, one Kind valid data screening plant 30, may include acquiring unit 301 and screening unit 302, in which:
Acquiring unit 301, suitable for obtaining data set to be analyzed.
Screening unit 302, suitable for being traversed to the data in acquired data set, obtain traversing extremely when preceding article number According to;It is excessive to judge whether current data changes compared with previous data;When the current data of determination is compared to previous item number When according to changing excessive, then the data in front and back a period of time based on current data, determine and record corresponding whole section of fluctuation The data fluctuations position of data;Next data is obtained, until the data set to be analyzed all complete by traversal.
In an embodiment of the present invention, the screening unit 302 is suitable for calculating the current data and previous data Between absolute difference, and by the way that the absolute difference being calculated to be compared with preset difference threshold, currently with judgement It is excessive whether data changes compared with previous data.
Optionally, the screening unit 302 is suitable for changing when determining current data compared to previous data excessive When, the count value of preset n of logger is increased into preset numerical value;Judge whether the current count value of the logger is big In preset count threshold;When the current count value for determining the logger is greater than preset count threshold, obtain preset The information of stored the last item data fluctuations position in dynamic array;When determining that the dynamic array is empty or described dynamic When data fluctuations position locating for the last one valid data stored in state array is tail node, before determining current data ((n-2) -1) data be data fluctuations first node;Add again (most when the sequence number of the current data of determination subtracts (n-2) Small consecutive numbers -1) numerical value close to data daily mean value when, it is again plus (minimum to determine that the sequence number of current data subtracts (n-2) Consecutive numbers -1) data be data fluctuations tail node;When determining that the dynamic array is non-in the empty or described dynamic array Non- data fluctuations position locating for the last one valid data of storage is tail node, and the sequence number of current data subtracts (n-2) again plus when the non-mean value close to data daily of numerical value of (minimum consecutive numbers -1), (n-2) before current data is determined Data is the interim node of data fluctuations;The logger is moved to left X, and the sign bit of the logger is reset;X is Integer more than or equal to 1 and less than n.
In an embodiment of the present invention, the screening unit 302, suitable for the ((n- before determining current data 2) when -1) data is data fluctuations first node, the data of the corresponding whole section of fluctuation data are recorded using following array Undulation location:
Wherein, NiIndicate the sequence number of the data fluctuations first node of the corresponding whole section of fluctuation data of current data, NcIt indicates The sequence number of current data, MiIndicate the judgment value label of the data fluctuations tail node of corresponding whole section of fluctuation data.
In an embodiment of the present invention, the screening unit 302, suitable for (n-2) before determining current data When data is the interim node of data fluctuations, the data wave of the corresponding whole section of fluctuation data is recorded using following array Dynamic position:
Wherein, NiIndicate the sequence number of the interim node of data fluctuations of the corresponding whole section of fluctuation data of current data, NcTable Show the sequence number of current data, MiIndicate the judgment value label of the data fluctuations tail node of corresponding whole section of fluctuation data.
In an embodiment of the present invention, the screening unit 302, suitable for being subtracted when the sequence number for determining current data (n-2) (minimum consecutive numbers -1) data is added constantly, to record the correspondence using following array for data fluctuations tail node again Whole section fluctuation data data fluctuations position:
Wherein, NiIndicate the sequence number of the data fluctuations tail node of the corresponding whole section of fluctuation data of current data, NcIt indicates The sequence number of current data, MiIndicate the judgment value label of the data fluctuations tail node of corresponding whole section of fluctuation data.
Above-mentioned scheme in the embodiment of the present invention is obtained by traversing to the data in acquired data set The current data of traversal extremely;When the current data of determination is excessive compared to the variation of previous data, current data is determined For valid data;When the current data of determination is valid data, it is determined that and record the data of corresponding whole section of fluctuation data Undulation location;Next data is obtained, until the data set to be analyzed all complete by traversal, it is possible to determine that special before and after data Data fluctuations situation in section of fixing time dynamically holds the overall process of the Decision boundaries situation of the data, is improving data as far as possible In the case where precision and final data analysis result precision, reduce to the greatest extent data screening method bring computing resource itself and Overhead in terms of additional storage space, therefore the accuracy and efficiency of valid data screening can be improved.
The basic principles, main features and advantages of the present invention have been shown and described above.The technology of the industry Personnel are it should be appreciated that the present invention is not limited to the above embodiments, and the above embodiments and description only describe this The principle of invention, without departing from the spirit and scope of the present invention, various changes and improvements may be made to the invention, the present invention Claimed range is delineated by the appended claims, the specification and equivalents thereof from the appended claims.

Claims (8)

1. a kind of valid data screening plant characterized by comprising
Acquiring unit, suitable for obtaining data set to be analyzed;
Screening unit obtains traversing current data extremely suitable for traversing to the data in acquired data set;Judgement It is excessive whether current data changes compared with previous data;When the current data of determination changed compared to previous data When big, then the data in front and back a period of time based on current data, determine and record the number of corresponding whole section of fluctuation data According to undulation location;Next data is obtained, until the data set to be analyzed all complete by traversal.
2. valid data screening plant according to claim 1, which is characterized in that the screening unit is suitable for calculating institute The absolute difference between current data and previous data is stated, and passes through the absolute difference that will be calculated and preset difference Whether threshold value is compared, changed with to judge current data compared with previous data excessive.
3. valid data screening plant according to claim 1, which is characterized in that the screening unit is suitable for when determination When current data is excessive compared to the variation of previous data, the count value of preset n of logger is increased into preset number Value;Judge whether the current count value of the logger is greater than preset count threshold;When the current meter for determining the logger When numerical value is greater than preset count threshold, the letter of stored the last item data fluctuations position in preset dynamic array is obtained Breath;When determining that the dynamic array is data wave locating for the last one valid data for storing in the empty or described dynamic array When dynamic position is tail node, ((n-2) -1) data before determining current data is data fluctuations first node;Work as determination The sequence number of current data subtracts (n-2) again plus when the numerical value of (minimum consecutive numbers -1) is close to data daily mean value, and determination is worked as The sequence number of preceding data subtracts (n-2) and (minimum consecutive numbers -1) data is added to be the tail node of data fluctuations again;When determining State dynamic array it is non-for data fluctuations position locating for the last one valid data for being stored in the empty or described dynamic array it is non- Sequence number for tail node, and current data subtracts (n-2) again plus the numerical value of (minimum consecutive numbers -1) is non-close to data daily When mean value, (n-2) data before determining current data is the interim node of data fluctuations;The logger is moved to left X, and the sign bit of the logger is reset;X is the integer more than or equal to 1 and less than n.
4. valid data screening plant according to claim 3, which is characterized in that the screening unit is suitable for when determination When ((n-2) -1) data before current data is data fluctuations first node, it is described right to be recorded using following array The data fluctuations position for the whole section of fluctuation data answered:
Wherein, NiIndicate the sequence number of the data fluctuations first node of the corresponding whole section of fluctuation data of current data, NcIndicate current The sequence number of data, MiIndicate the judgment value label of the data fluctuations tail node of corresponding whole section of fluctuation data.
5. valid data screening plant according to claim 3, which is characterized in that the screening unit is suitable for when determination When (n-2) data before current data is the interim node of data fluctuations, it is described right to be recorded using following array The data fluctuations position for the whole section of fluctuation data answered:
Wherein, NiIndicate the sequence number of the interim node of data fluctuations of the corresponding whole section of fluctuation data of current data, NcExpression is worked as The sequence number of preceding data, MiIndicate the judgment value label of the data fluctuations tail node of corresponding whole section of fluctuation data.
6. valid data screening plant according to claim 3, which is characterized in that the screening unit is suitable for when true When the sequence number of settled preceding data subtracts the tail node that (n-2) adds (minimum consecutive numbers -1) data for data fluctuations again, adopt The data fluctuations position of the corresponding whole section of fluctuation data is recorded with following array:
Wherein, NiIndicate the sequence number of the data fluctuations tail node of the corresponding whole section of fluctuation data of current data, NcIndicate current The sequence number of data, MiIndicate the judgment value label of the data fluctuations tail node of corresponding whole section of fluctuation data.
7. valid data screening plant according to claim 3, which is characterized in that X=1.
8. valid data screening plant according to claim 3, which is characterized in that n=32.
CN201811247432.7A 2018-10-24 2018-10-24 Effective data screening device Active CN109522300B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811247432.7A CN109522300B (en) 2018-10-24 2018-10-24 Effective data screening device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811247432.7A CN109522300B (en) 2018-10-24 2018-10-24 Effective data screening device

Publications (2)

Publication Number Publication Date
CN109522300A true CN109522300A (en) 2019-03-26
CN109522300B CN109522300B (en) 2021-09-28

Family

ID=65773748

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811247432.7A Active CN109522300B (en) 2018-10-24 2018-10-24 Effective data screening device

Country Status (1)

Country Link
CN (1) CN109522300B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104050268A (en) * 2014-06-23 2014-09-17 西北工业大学 Continuous data protection and recovery method with log space adjustable online
CN105354208A (en) * 2015-09-21 2016-02-24 江苏讯狐信息科技有限公司 Big data information mining method
CN106162698A (en) * 2015-04-15 2016-11-23 中国电信股份有限公司 Network Abnormal problem analysis method and device
US20180075006A1 (en) * 2016-09-12 2018-03-15 DataRails LTD. System and method for logical identification of differences between spreadsheets

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104050268A (en) * 2014-06-23 2014-09-17 西北工业大学 Continuous data protection and recovery method with log space adjustable online
CN106162698A (en) * 2015-04-15 2016-11-23 中国电信股份有限公司 Network Abnormal problem analysis method and device
CN105354208A (en) * 2015-09-21 2016-02-24 江苏讯狐信息科技有限公司 Big data information mining method
US20180075006A1 (en) * 2016-09-12 2018-03-15 DataRails LTD. System and method for logical identification of differences between spreadsheets

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
ERZHUOCHE,MICHAEL J.OLSEN: "Fast ground filtering for TLS data via Scanline Density Analysis", 《ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING》 *
余浩: "基于大数据的数据存储及数据筛选问题研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
毕朝国,徐小龙: "一种云存储系统中重复数据删除机制", 《计算机应用研究》 *

Also Published As

Publication number Publication date
CN109522300B (en) 2021-09-28

Similar Documents

Publication Publication Date Title
Jackson et al. An algorithm for optimal partitioning of data on an interval
Wong et al. An efficient method for weighted sampling without replacement
DeZeeuw et al. An adaptively refined Cartesian mesh solver for the Euler equations
US10366095B2 (en) Processing time series
US9052748B2 (en) System and method for inputting text into electronic devices
US10133549B1 (en) Systems and methods for implementing a synchronous FIFO with registered outputs
CN109522382A (en) Spatial data gridding statistical method and device
CN110428139A (en) The information forecasting method and device propagated based on label
CN111522968A (en) Knowledge graph fusion method and device
CN104679720A (en) Operation method for FFT
Brackbill Coordinate system control: adaptive meshes
CN110245155A (en) Data processing method, device, computer readable storage medium and terminal device
Pettersson et al. Adaptive stratified sampling for nonsmooth problems
CN112825199B (en) Collision detection method, device, equipment and storage medium
CN109542927A (en) Valid data screening technique, readable storage medium storing program for executing and terminal
Bražėnas et al. Parallel algorithms for fitting Markov arrival processes
CN107506388A (en) A kind of iterative data balancing optimization method towards Spark parallel computation frames
CN109522300A (en) Valid data screening plant
CN104516823B (en) A kind of date storage method and device
CN108170837A (en) Method of Data Discretization, device, computer equipment and storage medium
CN110020954B (en) Revenue distribution method and device and computer equipment
CN108011735B (en) Community discovery method and device
Hubalek et al. A multivariate view of random bucket digital search trees
Schweitzer et al. Buffer overflow calculations using an infinite-capacity model
CN113221862A (en) Data filtering method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant