CN109522300B - Effective data screening device - Google Patents

Effective data screening device Download PDF

Info

Publication number
CN109522300B
CN109522300B CN201811247432.7A CN201811247432A CN109522300B CN 109522300 B CN109522300 B CN 109522300B CN 201811247432 A CN201811247432 A CN 201811247432A CN 109522300 B CN109522300 B CN 109522300B
Authority
CN
China
Prior art keywords
data
fluctuation
piece
current
screening
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811247432.7A
Other languages
Chinese (zh)
Other versions
CN109522300A (en
Inventor
徐小龙
林皓伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN201811247432.7A priority Critical patent/CN109522300B/en
Publication of CN109522300A publication Critical patent/CN109522300A/en
Application granted granted Critical
Publication of CN109522300B publication Critical patent/CN109522300B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

An apparatus for screening valid data, the apparatus comprising: an acquisition unit adapted to acquire a dataset to be analyzed; the screening unit is suitable for traversing the data in the acquired data set to obtain the traversed current piece of data; judging whether the current data is changed too much compared with the previous data; when the current data is determined to be changed too much compared with the previous data, determining and recording the data fluctuation position of the corresponding whole fluctuation data based on the data in a period of time before and after the current data; and acquiring the next piece of data until the data set to be analyzed is completely traversed. By the scheme, the efficiency and the accuracy of effective data screening can be improved.

Description

Effective data screening device
Technical Field
The invention belongs to the technical field of data analysis, and particularly relates to an effective data screening device.
Background
Since 2012, the term "big data" is frequently introduced into people's visual field and is widely accepted and studied. These data, which are of ever-growing scale, hide the huge potential value behind them, and determine the direction and outcome of future development of many enterprises and various fields. Now, more and more enterprises are aware of the hidden danger caused by the explosive growth of data, and the importance of mass data to the enterprises is gradually paid attention to. Although the big data brings continuous business information and social value to people, the problem is obvious-the data volume in the current time is too large.
The huge amount of data in a big data environment causes a great amount of resources and time to be consumed for analyzing effective information, and daily mean data and marginal data of the big data environment are of great weight. In order to reduce the resources and time consumed by these calculations, in addition to designing a more excellent data analysis algorithm, it is also possible to start with the reduction of the data size.
Disclosure of Invention
The invention aims to solve the technical problem of how to improve the efficiency and the accuracy of effective data screening.
In order to achieve the above object, an embodiment of the present invention provides an effective data screening apparatus, including:
an acquisition unit adapted to acquire a dataset to be analyzed;
the screening unit is suitable for traversing the data in the acquired data set to obtain the traversed current piece of data; judging whether the current data is changed too much compared with the previous data; when the current data is determined to be changed too much compared with the previous data, determining and recording the data fluctuation position of the corresponding whole fluctuation data based on the data in a period of time before and after the current data; and acquiring the next piece of data until the data set to be analyzed is completely traversed.
Optionally, the screening unit is adapted to calculate an absolute difference between the current piece of data and the previous piece of data, and compare the calculated absolute difference with a preset difference threshold to determine whether the current piece of data is changed too much from the previous piece of data.
Optionally, the screening unit is adapted to increase a preset count value of the n-bit recorder when it is determined that the current piece of data is changed too much compared with the previous piece of data; judging whether the current count value of the recorder is greater than a preset count threshold value or not; when the current count value of the recorder is determined to be larger than a preset count threshold value, acquiring the information of the fluctuation position of the last piece of data stored in a preset dynamic array; when the dynamic array is determined to be empty or the data fluctuation position where the last valid data stored in the dynamic array is located is a tail node, determining the ((n-2) -1) th piece of data before the current piece of data as a data fluctuation head node; when the value of subtracting (n-2) and adding (minimum continuous number-1) from the serial number of the current piece of data is determined to be close to the daily average value of the data, determining that the value of subtracting (n-2) and adding (minimum continuous number-1) from the serial number of the current piece of data is a tail node of data fluctuation; when the dynamic array is determined to be not empty or the data fluctuation position where the last valid data stored in the dynamic array is located is determined not to be a tail node, and the value obtained by subtracting (n-2) and adding (minimum continuous number-1) from the serial number of the current piece of data is not close to the daily average value of the data, determining that the (n-2) th piece of data before the current piece of data is an interim node of data fluctuation; shifting the recorder by X bit to the left, and clearing the sign bit of the recorder; x is an integer greater than or equal to 1 and less than n.
Optionally, the screening unit is adapted to, when it is determined that the ((n-2) -1) th piece of data before the current piece of data is the data fluctuation head node, record the data fluctuation position of the corresponding whole piece of fluctuation data by using the following array:
Figure BDA0001839265570000021
wherein N isiA sequence number, N, of a data fluctuation head node representing a whole fluctuation data segment corresponding to a current data segmentcSequence number, M, indicating the current piece of dataiAnd marking the judgment value of the data fluctuation tail node of the corresponding whole fluctuation data.
Optionally, the screening unit is adapted to, when it is determined that the (n-2) th piece of data before the current piece of data is an interim node of data fluctuation, record the data fluctuation position of the corresponding whole piece of fluctuation data by using the following array:
Figure BDA0001839265570000022
wherein N isiIndicating the sequence number of the node in the data fluctuation period of the whole fluctuation data corresponding to the current piece of data, NcSequence number, M, indicating the current piece of dataiAnd marking the judgment value of the data fluctuation tail node of the corresponding whole fluctuation data.
Optionally, the screening unit is adapted to, when determining that the serial number of the current piece of data minus (n-2) plus (minimum consecutive number-1) pieces of data is a data fluctuation tail node, record the data fluctuation position of the corresponding whole piece of fluctuation data by using the following array:
Figure BDA0001839265570000031
wherein N isiSequence number of data fluctuation tail node of whole fluctuation data corresponding to current data, NcSequence number, M, indicating the current piece of dataiAnd marking the judgment value of the data fluctuation tail node of the corresponding whole fluctuation data.
Optionally, X ═ 1.
Optionally, n-32.
Compared with the prior art, the invention has the beneficial effects that:
according to the scheme, the data in the acquired data set are traversed to obtain the traversed current piece of data; when the current data is determined to be changed too much compared with the previous data, determining and recording the data fluctuation position of the corresponding whole fluctuation data based on the data in a period of time before and after the current data; the next piece of data is obtained until the data set to be analyzed is completely traversed, the data fluctuation condition in a specific time period before and after the data can be judged, the whole process of judging the boundary condition of the data is dynamically grasped, and under the condition that the data precision and the final data analysis result precision are improved as much as possible, the system overhead in the aspects of computing resources and extra storage space brought by the data screening method is reduced as much as possible, so that the accuracy and the efficiency of effective data screening can be improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive labor.
FIG. 1 is a schematic flow chart of a method for efficient data screening in an embodiment of the present invention;
FIG. 2 is a schematic flow chart of another efficient data screening method in an embodiment of the present invention;
fig. 3 is a schematic structural diagram of an effective data screening apparatus in an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application. The directional indications (such as up, down, left, right, front, back, etc.) in the embodiments of the present invention are only used to explain the relative positional relationship between the components, the movement, etc. in a specific posture (as shown in the drawings), and if the specific posture is changed, the directional indication is changed accordingly.
As described in the background, one effective data screening method in the prior art is to mark data whose absolute difference from daily mean data is greater than a threshold as effective data. However, this method has the following problems:
(1) if the data generates small fluctuation in a very short time due to various conditions, but the small fluctuation data has no medium or large fluctuation or small fluctuation which meets the conditions and is worth analyzing within a certain time, the research significance of the small fluctuation data is low, the influence on the final data analysis result is low, and the data is regarded as marginal data. If the data scale of the large data environment is large, the scale amount of the marginal data is also huge, and a large share of computing resources and time are consumed.
(2) If the influence of the marginal data is considered and the screening of the marginal data is increased while traversing the data, the consumption of computing resources and the cost of additional storage resources in the computer are increased to a certain extent.
Therefore, the effective data screening method in the prior art has the problems of low accuracy and low efficiency.
In order to solve the above problems, in the technical solution of the embodiment of the present invention, when it is determined that the current piece of data has too large variation compared with the previous piece of data, the data fluctuation position of the corresponding whole piece of fluctuation data is determined and recorded based on data in a period of time before and after the current piece of data, the data fluctuation condition in a specific period of time before and after the data can be determined, the whole process of determining the boundary condition of the data can be dynamically grasped, and under the condition of improving the data accuracy and the final data analysis result accuracy as much as possible, the system overhead in terms of computing resources and extra storage space brought by the data screening method itself can be reduced as much as possible, so that the accuracy and efficiency of effective data screening can be improved.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below.
The method comprises the steps of traversing each piece of data in a data set to be analyzed, and judging the data property of each piece of data according to the change condition of each piece of data in a period of time before and after. If the data is judged to be valid, the position of the data within the whole data fluctuation is judged. And according to different positions, carrying out different types of effective data marking operations. And finally, extracting all the effective data in the previous data set according to different marks of the effective data, and performing segmentation processing.
Fig. 1 is a schematic flow chart illustrating an effective data screening method according to an embodiment of the present invention. Referring to fig. 1, the effective data screening method in the embodiment of the present invention may specifically include the following steps:
step S101: a data set to be analyzed is acquired.
Step S102: and traversing the acquired data in the data set to obtain the traversed current piece of data.
In a specific implementation, the sequence of traversing the data in the acquired dataset may be performed according to actual analysis needs, and is not limited herein.
Step S103: judging whether the current data is changed too much compared with the previous data; when the judgment result is yes, step S104 may be performed; otherwise, step S106 may be performed.
In a specific implementation, when determining whether the current piece of data has an excessive change compared with the previous piece of data, an absolute difference between the current piece of data and the previous piece of data may be first calculated, the calculated absolute difference is compared with a preset difference threshold, and whether the current piece of data has an excessive change compared with the previous piece of data is determined according to a comparison result.
Step S104: and determining and recording the data fluctuation position of the corresponding whole fluctuation data based on the data in a period of time before and after the current piece of data.
In a specific implementation, when it is determined that the current piece of data has too much variation compared with the previous piece of data, the data fluctuation position of the whole corresponding piece of fluctuation data may be determined and recorded based on data in a period of time before and after the current piece of data, as described in detail in corresponding parts of fig. 2.
Step S106: judging whether the traversal of the data set to be analyzed is completed; when the judgment result is no, step S107 may be performed; otherwise, the operation may end.
Step S107: the next piece of data is acquired.
In a specific implementation, when it is determined that the traversal of the data set to be analyzed is not completed, the next piece of data of the current piece of data may be acquired as the traversed current piece of data, and the execution is continued from step S103 until all pieces of data in the data set to be analyzed are completely traversed.
The effective data screening method in the embodiment of the present invention will be described in further detail with reference to fig. 2.
Fig. 2 is a schematic flow chart illustrating an effective data screening method according to an embodiment of the present invention. Referring to fig. 2, the effective data screening method in the embodiment of the present invention may specifically include the following steps:
step S201: a data set to be analyzed is acquired.
Step S202: and traversing the acquired data in the data set to obtain the traversed current piece of data.
Step S203: judging whether the current data is changed too much compared with the previous data; when the judgment result is yes, step S204 may be performed; otherwise, step S205 may be performed.
Step S204: the preset n-bit counter value of the recorder is incremented by 1.
In specific implementation, the value of n can be set according to actual needs.
In one embodiment of the present invention, a binary bit is selected as the recorder. As to why the binary bits are chosen as recorders, the following considerations apply:
(1) in general algorithms, a number can only record one valid message. But if a binary digit number is used as a recorder, it can have a very large capacity at a very low cost. Taking the example of the integer, if one integer has 32 bits and the first flag bit is removed, it is possible to record a change of 31 bits in a memory space having such a small capacity, and the cost performance is extremely high.
(2) Based on the hardware operation consideration of a computer, a binary n-bit recorded with (n-1) bits is shifted to the left, which is equivalent to eliminating the influence of (n-2) th bit data before the current piece of data, and the change information of the remaining (n-2) pieces of data is endowed with different priorities again according to time and user evaluation conditions.
(3) Binary bits are a form of storage for hardware in a computer that also possesses decimal significance. And comparing the recorder of the binary bit with a special evaluation value obtained according to user evaluation, and obtaining whether the (n-2) th data from the current data is marginal data without specific data fluctuation within a time range before and after according to a judgment result.
Of course, those skilled in the art may also employ non-binary bit recorders, which are not limited herein.
Step S205: judging whether the current count value of the recorder is greater than a preset count threshold value or not; when the judgment result is yes, step S206 may be performed; otherwise, step S208 may be performed.
In a specific implementation, the preset count threshold may be set according to actual needs. For example, when the recorder bit number n is 32, the count threshold is set to 28672, 7000 in hexadecimal form.
Step S206: acquiring information of a data fluctuation position stored in a preset dynamic array, and judging whether the dynamic array is empty or whether a data fluctuation position of last effective data stored in the dynamic array is a tail node; when the judgment result is yes, step S207 may be performed; otherwise, step S208 may be performed.
In specific implementation, the dynamic array is used for recording the bit sequence of valid data in the analyzed data in the data set to be analyzed and information of whether the valid data is marked by a decision value of a tail node of the whole fluctuation data.
Step S207: and determining the ((n-2) -1) th piece of data before the current piece of data as a data fluctuation first node.
In a specific implementation, when it is determined that the dynamic array is empty or whether the data fluctuation position where the last valid data stored in the dynamic array is located is a tail node, the ((n-2) -1) th data before the current piece of data can be determined as a data fluctuation head node. At this time, the following valid data model may be used to record and store the information of the head node in the dynamic array:
Figure BDA0001839265570000081
wherein N isiA sequence number, N, of a data fluctuation head node representing a whole fluctuation data segment corresponding to a current data segmentcSequence number, M, indicating the current piece of dataiAnd marking the judgment value of the data fluctuation tail node of the corresponding whole fluctuation data.
Step S208: judging whether the value obtained by subtracting (n-2) and adding (minimum continuous number-1) from the serial number of the current piece of data is close to the daily average value of the data or not; when the judgment result is yes, step S209 may be performed; otherwise, step S210 may be performed.
In a specific implementation, the minimum consecutive number is the number of consecutive "1" of the count threshold value from the highest bit excluding the sign bit to the lower bit in step S205; the daily average value of the data is a normal numerical value of the data of the system under the daily fluctuation-free condition.
Step S209: and determining the serial number of the current piece of data minus (n-2) plus (minimum continuous number-1) pieces of data as the tail node of the data fluctuation.
In a specific implementation, when the value of the serial number of the current piece of data minus (n-2) plus (minimum continuous number-1) is close to the daily average value of the data, the serial number of the previous piece of data minus (n-2) plus (minimum continuous number-1) can be determined as the tail node of the data fluctuation. At this time, the following valid data model may be used to record the information of the tail node and store the information in the dynamic array:
Figure BDA0001839265570000082
wherein N isiSequence number of data fluctuation tail node of whole fluctuation data corresponding to current data, NcSequence number, M, indicating the current piece of dataiAnd a judgment value mark of a data fluctuation tail node of the corresponding whole fluctuation data is represented, and min Con represents the minimum continuous number.
Step S210: and determining the (n-2) th data before the current data as the interim node of the data fluctuation and recording.
In a specific implementation, when the dynamic array is determined to be not empty or the data fluctuation position where the last valid data stored in the dynamic array is located is determined not to be a tail node, and the value obtained by subtracting (n-2) and adding (minimum continuous number-1) from the serial number of the current piece of data is not close to the daily average value of the data, determining that the (n-2) th piece of data before the current piece of data is a middle node of data fluctuation. At this time, the following valid data model can be used to record the information of the nodes in the period and store the information into the dynamic array:
Figure BDA0001839265570000091
wherein N isiIndicating the sequence number of the node in the data fluctuation period of the whole fluctuation data corresponding to the current piece of data, NcSequence number, M, indicating the current piece of dataiAnd marking the judgment value of the data fluctuation tail node of the corresponding whole fluctuation data.
Here, if the data is determined to be valid data, it is necessary to segment the valid data at the time of recording for the convenience of the subsequent data analysis. If the effective data is segmented, all the effective data serial numbers do not need to be recorded when the effective data is recorded during data screening, and only three data fluctuation nodes, namely a first node, a middle node and a tail node, need to be recorded, wherein:
the First Node indicates the start of data fluctuation.
The data meeting special conditions in the whole data fluctuation period, namely the Mid-term Node (Mid-term Node), shows the data with severe amplitude change in the data fluctuation.
The Last data of the whole data fluctuation indicates that the data fluctuation is finished.
There is another important reason for defining these three nodes in addition to the reason for valid data segmentation. One of the characteristics of the effective data screening method in the embodiment of the invention is that whether the current data is effective data or not can be judged according to the fluctuation change condition of the data within a period of time in the future, namely the 'forecasting' capability. In the actual implementation process, the function is to firstly "skip" the current data, and after traversing (n-1) pieces of data, according to the conditions of the data, return to judge the previous (n-2) th piece of data. This has the problem that there is a certain degree of "delay" in the data location determination.
This "delay" is offset by research and experimental testing. In the effective data model in the embodiment of the present invention, if the current effective data is determined as the head node, on the basis that the serial number of the previous (n-2) th data is obtained by subtracting (n-2) from the serial number of the current data, 1 is subtracted again to obtain the previous data of which the data starts to fluctuate, and the data is the real head node, that is, the first data of the whole data fluctuation; if the current effective data is judged to be the tail node, on the basis that the serial number of the previous (n-2) th data is obtained by subtracting (n-2) from the serial number of the current data, the previously defined minimum continuous number is added, the serial number of the effective data is continued for a section backwards to offset the tail end of the fluctuation after the fluctuation of the effective data reaches the end, so that the 'delay' caused by the invalidation of the data starting to be judged in advance is caused, then the 1 is subtracted to obtain the minimum continuous number after the fluctuation of the data starting to be changed minus 1 data, the data is the real tail node after the 'delay' is offset, namely the last data of the fluctuation of the whole section of data, meanwhile, the tail node is marked, and the subsequent sections are padded; if the current data is still valid data under the condition that the current data is judged to be not the first node and not the last node, the data is data with large variation amplitude in the whole data fluctuation, and the condition has no 'delay' phenomenon, so that the serial number of the previous (n-2) th data can be obtained by only subtracting (n-2) from the serial number of the current data.
Step S211: and shifting the count value of the recorder by X bits to the left and clearing the sign bit.
In specific implementation, the left shift of the recorder by X bits and the zero clearing of the sign bit of the recorder can be selected according to actual needs, such as determination of the weight condition of the data very close to the current data. Wherein X is 2MAnd M is an integer. In one embodiment of the present invention, X is 1.
Step S212: judging whether the traversal of the data set to be analyzed is finished; when the judgment result is no, step S213 may be performed; otherwise, the operation may end.
In a specific implementation, when it is determined that the traversal of the data set to be analyzed is completed, the operation may be directly ended, so as to obtain information of valid data in the data set to be analyzed.
Step S213: the next piece of data is acquired.
In a specific implementation, when it is determined that the traversal of the data set to be analyzed is not completed, the next piece of data may be obtained in sequence as the current piece of data traversed, and the execution is started from step S203 until all the traversals of the data set to be analyzed are completed.
The scheme in the embodiment of the invention has the following beneficial effects:
(1) the data screening result is accurate and reliable. For the identification of the relevance significance of mass data, the invention provides a complete judgment system based on a binary bit recorder. In order to search the marginal data hidden in mass data in a centralized manner and improve the accuracy and precision of a data analysis result obtained according to the data set, a binary digit is introduced as a recorder, the data fluctuation condition in a specific time period before and after the data needs to be judged is counted, and the whole process of judging the boundary condition of the data is dynamically grasped.
(2) High precision and low consumption. The binary bit recorder used by the invention can record large-range data information and only occupies a very small computer hardware storage space, so that marginal data influencing data precision is eliminated, and meanwhile, the consumption of computing resources and extra storage space is very small.
(3) May be embedded in other data analysis algorithms. All calculations of the algorithm are located in single data processing of traversal data, which means that the algorithm can be directly acted in other data screening or data analysis algorithms, matched with all other algorithms, and complementary with the other algorithms in terms of quality, namely, positioned as an auxiliary algorithm.
The embodiment of the invention also provides a computer-readable storage medium, wherein computer instructions are stored on the computer-readable storage medium, and the steps of the effective data screening method are executed when the computer instructions are executed. For the method for screening valid data, please refer to the introduction of the previous section, which is not described herein again.
The embodiment of the invention also provides a terminal, which comprises a memory and a processor, wherein the memory is stored with a computer instruction capable of running on the processor, and the processor executes the steps of the effective data screening method when running the computer instruction.
The effective data screening method in the implementation of the present invention is described in detail above, and the corresponding apparatus of the method will be described below.
Fig. 3 is a schematic structural diagram illustrating an effective data screening apparatus according to an embodiment of the present invention. Referring to fig. 3, a valid data screening apparatus 30 may include an obtaining unit 301 and a screening unit 302, wherein:
an obtaining unit 301 adapted to obtain a data set to be analyzed.
The screening unit 302 is adapted to traverse the data in the acquired data set to obtain a current data piece after traversal; judging whether the current data is changed too much compared with the previous data; when the current data is determined to be changed too much compared with the previous data, determining and recording the data fluctuation position of the corresponding whole fluctuation data based on the data in a period of time before and after the current data; and acquiring the next piece of data until the data set to be analyzed is completely traversed.
In an embodiment of the present invention, the screening unit 302 is adapted to calculate an absolute difference between the current piece of data and the previous piece of data, and compare the calculated absolute difference with a preset difference threshold to determine whether the current piece of data is changed too much from the previous piece of data.
Optionally, the screening unit 302 is adapted to increase a preset count value of the n-bit recorder by a preset value when it is determined that the current piece of data is changed too much compared with the previous piece of data; judging whether the current count value of the recorder is greater than a preset count threshold value or not; when the current count value of the recorder is determined to be larger than a preset count threshold value, acquiring the information of the fluctuation position of the last piece of data stored in a preset dynamic array; when the dynamic array is determined to be empty or the data fluctuation position where the last valid data stored in the dynamic array is located is a tail node, determining the ((n-2) -1) th piece of data before the current piece of data as a data fluctuation head node; when the value of subtracting (n-2) and adding (minimum continuous number-1) from the serial number of the current piece of data is determined to be close to the daily average value of the data, determining that the value of subtracting (n-2) and adding (minimum continuous number-1) from the serial number of the current piece of data is a tail node of data fluctuation; when the dynamic array is determined to be not empty or the data fluctuation position where the last valid data stored in the dynamic array is located is determined not to be a tail node, and the value obtained by subtracting (n-2) and adding (minimum continuous number-1) from the serial number of the current piece of data is not close to the daily average value of the data, determining that the (n-2) th piece of data before the current piece of data is an interim node of data fluctuation; shifting the recorder by X bit to the left, and clearing the sign bit of the recorder; x is an integer greater than or equal to 1 and less than n.
In an embodiment of the present invention, the screening unit 302 is adapted to, when it is determined that the ((n-2) -1) th piece of data before the current piece of data is the data fluctuation head node, record the data fluctuation position of the corresponding whole piece of fluctuation data by using the following array:
Figure BDA0001839265570000121
wherein N isiA sequence number, N, of a data fluctuation head node representing a whole fluctuation data segment corresponding to a current data segmentcSequence number, M, indicating the current piece of dataiAnd marking the judgment value of the data fluctuation tail node of the corresponding whole fluctuation data.
In an embodiment of the present invention, the screening unit 302 is adapted to, when it is determined that the (n-2) th piece of data before the current piece of data is an interim node of data fluctuation, record the data fluctuation position of the corresponding whole piece of fluctuation data by using the following array:
Figure BDA0001839265570000131
wherein N isiIndicating the sequence number of the node in the data fluctuation period of the whole fluctuation data corresponding to the current piece of data, NcSequence number, M, indicating the current piece of dataiAnd marking the judgment value of the data fluctuation tail node of the corresponding whole fluctuation data.
In an embodiment of the present invention, the screening unit 302 is adapted to, when determining that the sequence number of the current piece of data minus (n-2) plus (minimum consecutive number-1) pieces of data is a data fluctuation end node, record the data fluctuation position of the corresponding whole piece of fluctuation data by using the following array:
Figure BDA0001839265570000132
wherein N isiSequence number of data fluctuation tail node of whole fluctuation data corresponding to current data, NcSequence number, M, indicating the current piece of dataiAnd marking the judgment value of the data fluctuation tail node of the corresponding whole fluctuation data.
According to the scheme in the embodiment of the invention, the data in the acquired data set are traversed to obtain the traversed current data; when the current piece of data is determined to be changed too much compared with the previous piece of data, determining the current piece of data as valid data; when the current data is determined to be valid data, determining and recording the data fluctuation position of the corresponding whole fluctuation data; the next piece of data is obtained until the data set to be analyzed is completely traversed, the data fluctuation condition in a specific time period before and after the data can be judged, the whole process of judging the boundary condition of the data is dynamically grasped, and under the condition that the data precision and the final data analysis result precision are improved as much as possible, the system overhead in the aspects of computing resources and extra storage space brought by the data screening method is reduced as much as possible, so that the accuracy and the efficiency of effective data screening can be improved.
The foregoing shows and describes the general principles, essential features, and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are described in the foregoing description only for the purpose of illustrating the principles of the present invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the invention as defined by the appended claims, specification, and equivalents thereof.

Claims (7)

1. An apparatus for screening useful data, comprising:
an acquisition unit adapted to acquire a dataset to be analyzed;
the screening unit is suitable for traversing the data in the acquired data set to obtain the traversed current piece of data; judging whether the current data is changed too much compared with the previous data; when it is determined that the change of the current piece of data is too large compared with the previous piece of data, determining and recording the data fluctuation position of the corresponding whole piece of fluctuation data based on data in a period of time before and after the current piece of data, specifically comprising: when the current data is determined to be changed too much compared with the previous data, increasing a preset value by the count value of a preset n-bit recorder; judging whether the current count value of the recorder is greater than a preset count threshold value or not; when the current count value of the recorder is determined to be larger than a preset count threshold value, acquiring the information of the fluctuation position of the last piece of data stored in a preset dynamic array; when the dynamic array is determined to be empty or the data fluctuation position where the last valid data stored in the dynamic array is located is a tail node, determining the ((n-2) -1) th piece of data before the current piece of data as a data fluctuation head node; when the value of subtracting (n-2) and adding (minimum continuous number-1) from the serial number of the current piece of data is determined to be close to the daily average value of the data, determining that the value of subtracting (n-2) and adding (minimum continuous number-1) from the serial number of the current piece of data is a tail node of data fluctuation; when the dynamic array is determined to be not empty or the data fluctuation position where the last valid data stored in the dynamic array is located is determined not to be a tail node, and the value obtained by subtracting (n-2) and adding (minimum continuous number-1) from the serial number of the current piece of data is not close to the daily average value of the data, determining that the (n-2) th piece of data before the current piece of data is an interim node of data fluctuation; shifting the recorder by X bit to the left, and clearing the sign bit of the recorder; x is an integer greater than or equal to 1 and less than n; and acquiring the next piece of data until the data set to be analyzed is completely traversed.
2. The device for screening effective data according to claim 1, wherein the screening unit is adapted to calculate an absolute difference between the current piece of data and the previous piece of data, and compare the calculated absolute difference with a preset difference threshold to determine whether the current piece of data is changed too much from the previous piece of data.
3. The effective data screening apparatus according to claim 1, wherein the screening unit is adapted to, when it is determined that ((n-2) -1) th data before the current piece of data is a data fluctuation head node, record a data fluctuation position of the corresponding whole piece of fluctuation data by using the following array:
Figure FDA0003098793430000021
wherein N isiA sequence number, N, of a data fluctuation head node representing a whole fluctuation data segment corresponding to a current data segmentcSequence number, M, indicating the current piece of dataiAnd marking the judgment value of the data fluctuation tail node of the corresponding whole fluctuation data.
4. The effective data screening apparatus according to claim 1, wherein the screening unit is adapted to, when it is determined that the (n-2) th data before the current piece of data is an interim node of data fluctuation, record the data fluctuation position of the corresponding whole piece of fluctuation data using the following array:
Figure FDA0003098793430000022
wherein N isiIndicating the sequence number of the node in the data fluctuation period of the whole fluctuation data corresponding to the current piece of data, NcSequence number, M, indicating the current piece of dataiAnd marking the judgment value of the data fluctuation tail node of the corresponding whole fluctuation data.
5. The effective data screening apparatus according to claim 1, wherein the screening unit is adapted to, when determining that the serial number of the current piece of data minus (n-2) plus (minimum consecutive number-1) pieces of data is the tail node of the data fluctuation, record the data fluctuation position of the corresponding whole piece of fluctuation data by using the following array:
Figure FDA0003098793430000023
wherein N isiSequence number of data fluctuation tail node of whole fluctuation data corresponding to current data, NcSequence number, M, indicating the current piece of dataiAnd a judgment value mark of a data fluctuation tail node representing the corresponding whole fluctuation data, wherein minCon represents the minimum continuous number.
6. The active data screening device of claim 1, wherein X is 1.
7. The active data screening device of claim 1, wherein n is 32.
CN201811247432.7A 2018-10-24 2018-10-24 Effective data screening device Active CN109522300B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811247432.7A CN109522300B (en) 2018-10-24 2018-10-24 Effective data screening device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811247432.7A CN109522300B (en) 2018-10-24 2018-10-24 Effective data screening device

Publications (2)

Publication Number Publication Date
CN109522300A CN109522300A (en) 2019-03-26
CN109522300B true CN109522300B (en) 2021-09-28

Family

ID=65773748

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811247432.7A Active CN109522300B (en) 2018-10-24 2018-10-24 Effective data screening device

Country Status (1)

Country Link
CN (1) CN109522300B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104050268A (en) * 2014-06-23 2014-09-17 西北工业大学 Continuous data protection and recovery method with log space adjustable online
CN105354208A (en) * 2015-09-21 2016-02-24 江苏讯狐信息科技有限公司 Big data information mining method
CN106162698A (en) * 2015-04-15 2016-11-23 中国电信股份有限公司 Network Abnormal problem analysis method and device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10824803B2 (en) * 2016-09-12 2020-11-03 DataRails LTD. System and method for logical identification of differences between spreadsheets

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104050268A (en) * 2014-06-23 2014-09-17 西北工业大学 Continuous data protection and recovery method with log space adjustable online
CN106162698A (en) * 2015-04-15 2016-11-23 中国电信股份有限公司 Network Abnormal problem analysis method and device
CN105354208A (en) * 2015-09-21 2016-02-24 江苏讯狐信息科技有限公司 Big data information mining method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于大数据的数据存储及数据筛选问题研究;余浩;《中国优秀硕士学位论文全文数据库 信息科技辑》;20160515;全文 *

Also Published As

Publication number Publication date
CN109522300A (en) 2019-03-26

Similar Documents

Publication Publication Date Title
CN105389349B (en) Dictionary update method and device
CN111104091B (en) Detection and conversion method for precision specific calculation in dynamic floating point error analysis
CN114817651B (en) Data storage method, data query method, device and equipment
CN109542927B (en) Effective data screening method, readable storage medium and terminal
CN106874332B (en) Database access method and device
CN109634960B (en) Key value data storage method, device, equipment and storage medium
CN109522300B (en) Effective data screening device
WO2019136799A1 (en) Data discretisation method and apparatus, computer device and storage medium
CN103714121A (en) Index record management method and device
CN110837555A (en) Method, equipment and storage medium for removing duplicate and screening of massive texts
CN115935208A (en) Online segmentation method, equipment and medium for multi-element time sequence running data of data center equipment
CN113495901A (en) Variable-length data block oriented quick retrieval method
CN112632337A (en) Element management method applied to firework filter and firework filter
CN112099759A (en) Numerical value processing method, device, processing equipment and computer readable storage medium
CN113221862B (en) Data filtering method and device, electronic equipment and storage medium
CN108431835B (en) Apparatus and method for determining length of correlation history
CN112765027B (en) Method for detecting redundant zero in application program execution process
CN116800637B (en) Method for estimating base number of data item in data stream and related equipment
CN110991838A (en) Method and device for determining competitiveness index of communication operator
CN113959536B (en) Denoising method, equipment, medium and product of jet water meter
Liu Research on computer simulation big data intelligent collection and analysis system
CN114969913B (en) Method, system, equipment and medium for instantiating three-dimensional model component
CN116185940B (en) Atomic counter operation method, device, equipment and storage medium
CN117472975A (en) Data point query method, data point query device cluster, data point query program product and data point query storage medium
CN111008525B (en) Method and system for calculating attention

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant