CN109542927B - Effective data screening method, readable storage medium and terminal - Google Patents

Effective data screening method, readable storage medium and terminal Download PDF

Info

Publication number
CN109542927B
CN109542927B CN201811247433.1A CN201811247433A CN109542927B CN 109542927 B CN109542927 B CN 109542927B CN 201811247433 A CN201811247433 A CN 201811247433A CN 109542927 B CN109542927 B CN 109542927B
Authority
CN
China
Prior art keywords
data
fluctuation
current
piece
determined
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811247433.1A
Other languages
Chinese (zh)
Other versions
CN109542927A (en
Inventor
徐小龙
林皓伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN201811247433.1A priority Critical patent/CN109542927B/en
Publication of CN109542927A publication Critical patent/CN109542927A/en
Application granted granted Critical
Publication of CN109542927B publication Critical patent/CN109542927B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

An effective data screening method, a readable storage medium and a terminal, the method comprising: acquiring a data set to be analyzed; traversing the acquired data in the data set to obtain a current traversed data; judging whether the current data is changed too much compared with the previous data; when the current data is determined to be changed too much compared with the previous data, determining and recording the data fluctuation position of the corresponding whole fluctuation data based on the data in a period of time before and after the current data; and acquiring the next piece of data until the data set to be analyzed is completely traversed. By the scheme, the efficiency and the accuracy of effective data screening can be improved.

Description

Effective data screening method, readable storage medium and terminal
Technical Field
The invention belongs to the technical field of data analysis, and particularly relates to an effective data screening method, a readable storage medium and a terminal.
Background
Since 2012, the term "big data" is frequently introduced into people's visual field and is widely accepted and studied. These data, which are of ever-growing scale, hide the huge potential value behind them, and determine the direction and outcome of future development of many enterprises and various fields. Now, more and more enterprises are aware of the hidden danger caused by the explosive growth of data, and the importance of mass data to the enterprises is gradually paid attention to. Although the big data brings continuous business information and social value to people, the problem is obvious-the data volume in the current time is too large.
The huge amount of data in a big data environment causes a great amount of resources and time to be consumed for analyzing effective information, and daily mean data and marginal data of the big data environment are of great weight. In order to reduce the resources and time consumed by these calculations, in addition to designing a more excellent data analysis algorithm, it is also possible to start with the reduction of the data size.
Disclosure of Invention
The invention aims to solve the technical problem of how to improve the efficiency and the accuracy of effective data screening.
In order to achieve the above object, the present invention provides an effective data screening method, including:
acquiring a data set to be analyzed;
traversing the acquired data in the data set to obtain a current traversed data;
judging whether the current data is changed too much compared with the previous data;
when the current data is determined to be changed too much compared with the previous data, determining and recording the data fluctuation position of the corresponding whole fluctuation data based on the data in a period of time before and after the current data;
and acquiring the next piece of data until the data set to be analyzed is completely traversed.
Optionally, the determining whether the current piece of data has changed too much compared with the previous piece of data includes:
and calculating the absolute difference value between the current piece of data and the previous piece of data, and comparing the calculated absolute difference value with a preset difference threshold value to judge whether the current piece of data is changed too much compared with the previous piece of data.
Optionally, the determining a data fluctuation position of the corresponding whole fluctuation data includes:
when the current data is determined to be changed too much compared with the previous data, increasing a preset value by the count value of a preset n-bit recorder;
judging whether the current count value of the recorder is greater than a preset count threshold value or not;
when the current count value of the recorder is determined to be larger than a preset count threshold value, acquiring the information of the fluctuation position of the last piece of data stored in a preset dynamic array;
when the dynamic array is determined to be empty or the data fluctuation position where the last valid data stored in the dynamic array is located is a tail node, determining the ((n-2) -1) th piece of data before the current piece of data as a data fluctuation head node;
when the value of subtracting (n-2) and adding (minimum continuous number-1) from the serial number of the current piece of data is determined to be close to the daily average value of the data, determining that the value of subtracting (n-2) and adding (minimum continuous number-1) from the serial number of the current piece of data is a tail node of data fluctuation;
when the dynamic array is determined to be not empty or the data fluctuation position where the last valid data stored in the dynamic array is located is determined not to be a tail node, and the value obtained by subtracting (n-2) and adding (minimum continuous number-1) from the serial number of the current piece of data is not close to the daily average value of the data, determining that the (n-2) th piece of data before the current piece of data is an interim node of data fluctuation;
shifting the recorder by X bit to the left, and clearing the sign bit of the recorder; x is an integer greater than or equal to 1 and less than n.
Optionally, when the ((n-2) -1) th piece of data before the current piece of data is determined to be the data fluctuation head node, recording the data fluctuation position of the corresponding whole piece of fluctuation data by using the following array:
Figure BDA0001839265640000021
wherein N isiA sequence number, N, of a data fluctuation head node representing a whole fluctuation data segment corresponding to a current data segmentcSequence number, M, indicating the current piece of dataiAnd marking the judgment value of the data fluctuation tail node of the corresponding whole fluctuation data.
Optionally, when the (n-2) th data before the current piece of data is determined to be an interim node of data fluctuation, recording the data fluctuation position of the corresponding whole piece of fluctuation data by using the following array:
Figure BDA0001839265640000031
wherein N isiIndicating the sequence number of the node in the data fluctuation period of the whole fluctuation data corresponding to the current piece of data, NcSequence number, M, indicating the current piece of dataiAnd marking the judgment value of the data fluctuation tail node of the corresponding whole fluctuation data.
Optionally, when the serial number of the current piece of data minus (n-2) plus (minimum continuous number-1) pieces of data is determined as the data fluctuation tail node, recording the data fluctuation position of the corresponding whole piece of fluctuation data by using the following array:
Figure BDA0001839265640000032
wherein N isiSequence number of data fluctuation tail node of whole fluctuation data corresponding to current data, NcSequence number, M, indicating the current piece of dataiAnd marking the judgment value of the data fluctuation tail node of the corresponding whole fluctuation data.
Optionally, X ═ 1.
Optionally, n-32.
The embodiment of the present invention further provides a computer-readable storage medium, on which computer instructions are stored, and when the computer instructions are executed, the method for screening effective data according to any one of the above-mentioned steps is performed.
The embodiment of the present invention further provides a terminal, which includes a memory and a processor, where the memory stores a computer instruction capable of running on the processor, and the processor executes the steps of any one of the above effective data screening methods when running the computer instruction.
Compared with the prior art, the invention has the beneficial effects that:
according to the scheme, the data in the acquired data set are traversed to obtain the traversed current piece of data; when the current data is determined to be changed too much compared with the previous data, determining and recording the data fluctuation position of the corresponding whole fluctuation data based on the data in a period of time before and after the current data; the next piece of data is obtained until the data set to be analyzed is completely traversed, the data fluctuation condition in a specific time period before and after the data can be judged, the whole process of judging the boundary condition of the data is dynamically grasped, and under the condition that the data precision and the final data analysis result precision are improved as much as possible, the system overhead in the aspects of computing resources and extra storage space brought by the data screening method is reduced as much as possible, so that the accuracy and the efficiency of effective data screening can be improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive labor.
FIG. 1 is a schematic flow chart of a method for efficient data screening in an embodiment of the present invention;
FIG. 2 is a schematic flow chart of another efficient data screening method in an embodiment of the present invention;
fig. 3 is a schematic structural diagram of an effective data screening apparatus in an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application. The directional indications (such as up, down, left, right, front, back, etc.) in the embodiments of the present invention are only used to explain the relative positional relationship between the components, the movement, etc. in a specific posture (as shown in the drawings), and if the specific posture is changed, the directional indication is changed accordingly.
As described in the background, one effective data screening method in the prior art is to mark data whose absolute difference from daily mean data is greater than a threshold as effective data. However, this method has the following problems:
(1) if the data generates small fluctuation in a very short time due to various conditions, but the small fluctuation data has no medium or large fluctuation or small fluctuation which meets the conditions and is worth analyzing within a certain time, the research significance of the small fluctuation data is low, the influence on the final data analysis result is low, and the data is regarded as marginal data. If the data scale of the large data environment is large, the scale amount of the marginal data is also huge, and a large share of computing resources and time are consumed.
(2) If the influence of the marginal data is considered and the screening of the marginal data is increased while traversing the data, the consumption of computing resources and the cost of additional storage resources in the computer are increased to a certain extent.
Therefore, the effective data screening method in the prior art has the problems of low accuracy and low efficiency.
In order to solve the above problems, in the technical solution of the embodiment of the present invention, when it is determined that the current piece of data has too large variation compared with the previous piece of data, the data fluctuation position of the corresponding whole piece of fluctuation data is determined and recorded based on data in a period of time before and after the current piece of data, the data fluctuation condition in a specific period of time before and after the data can be determined, the whole process of determining the boundary condition of the data can be dynamically grasped, and under the condition of improving the data accuracy and the final data analysis result accuracy as much as possible, the system overhead in terms of computing resources and extra storage space brought by the data screening method itself can be reduced as much as possible, so that the accuracy and efficiency of effective data screening can be improved.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below.
The method comprises the steps of traversing each piece of data in a data set to be analyzed, and judging the data property of each piece of data according to the change condition of each piece of data in a period of time before and after. If the data is judged to be valid, the position of the data within the whole data fluctuation is judged. And according to different positions, carrying out different types of effective data marking operations. And finally, extracting all the effective data in the previous data set according to different marks of the effective data, and performing segmentation processing.
Fig. 1 is a schematic flow chart illustrating an effective data screening method according to an embodiment of the present invention. Referring to fig. 1, the effective data screening method in the embodiment of the present invention may specifically include the following steps:
step S101: a data set to be analyzed is acquired.
Step S102: and traversing the acquired data in the data set to obtain the traversed current piece of data.
In a specific implementation, the sequence of traversing the data in the acquired dataset may be performed according to actual analysis needs, and is not limited herein.
Step S103: judging whether the current data is changed too much compared with the previous data; when the judgment result is yes, step S104 may be performed; otherwise, step S106 may be performed.
In a specific implementation, when determining whether the current piece of data has an excessive change compared with the previous piece of data, an absolute difference between the current piece of data and the previous piece of data may be first calculated, the calculated absolute difference is compared with a preset difference threshold, and whether the current piece of data has an excessive change compared with the previous piece of data is determined according to a comparison result.
Step S104: and determining and recording the data fluctuation position of the corresponding whole fluctuation data based on the data in a period of time before and after the current piece of data.
In a specific implementation, when it is determined that the current piece of data has too much variation compared with the previous piece of data, the data fluctuation position of the whole corresponding piece of fluctuation data may be determined and recorded based on data in a period of time before and after the current piece of data, as described in detail in corresponding parts of fig. 2.
Step S106: judging whether the traversal of the data set to be analyzed is completed; when the judgment result is no, step S107 may be performed; otherwise, the operation may end.
Step S107: the next piece of data is acquired.
In a specific implementation, when it is determined that the traversal of the data set to be analyzed is not completed, the next piece of data of the current piece of data may be acquired as the traversed current piece of data, and the execution is continued from step S103 until all pieces of data in the data set to be analyzed are completely traversed.
The effective data screening method in the embodiment of the present invention will be described in further detail with reference to fig. 2.
Fig. 2 is a schematic flow chart illustrating an effective data screening method according to an embodiment of the present invention. Referring to fig. 2, the effective data screening method in the embodiment of the present invention may specifically include the following steps:
step S201: a data set to be analyzed is acquired.
Step S202: and traversing the acquired data in the data set to obtain the traversed current piece of data.
Step S203: judging whether the current data is changed too much compared with the previous data; when the judgment result is yes, step S204 may be performed; otherwise, step S205 may be performed.
Step S204: the preset n-bit counter value of the recorder is incremented by 1.
In specific implementation, the value of n can be set according to actual needs.
In one embodiment of the present invention, a binary bit is selected as the recorder. As to why the binary bits are chosen as recorders, the following considerations apply:
(1) in general algorithms, a number can only record one valid message. But if a binary digit number is used as a recorder, it can have a very large capacity at a very low cost. Taking the example of the integer, if one integer has 32 bits and the first flag bit is removed, it is possible to record a change of 31 bits in a memory space having such a small capacity, and the cost performance is extremely high.
(2) Based on the hardware operation consideration of a computer, a binary n-bit recorded with (n-1) bits is shifted to the left, which is equivalent to eliminating the influence of (n-2) th bit data before the current piece of data, and the change information of the remaining (n-2) pieces of data is endowed with different priorities again according to time and user evaluation conditions.
(3) Binary bits are a form of storage for hardware in a computer that also possesses decimal significance. And comparing the recorder of the binary bit with a special evaluation value obtained according to user evaluation, and obtaining whether the (n-2) th data from the current data is marginal data without specific data fluctuation within a time range before and after according to a judgment result.
Of course, those skilled in the art may also employ non-binary bit recorders, which are not limited herein.
Step S205: judging whether the current count value of the recorder is greater than a preset count threshold value or not; when the judgment result is yes, step S206 may be performed; otherwise, step S208 may be performed.
In a specific implementation, the preset count threshold may be set according to actual needs. For example, when the recorder bit number n is 32, the count threshold is set to 28672, 7000 in hexadecimal form.
Step S206: acquiring information of a data fluctuation position stored in a preset dynamic array, and judging whether the dynamic array is empty or whether a data fluctuation position of last effective data stored in the dynamic array is a tail node; when the judgment result is yes, step S207 may be performed; otherwise, step S208 may be performed.
In specific implementation, the dynamic array is used for recording the bit sequence of valid data in the analyzed data in the data set to be analyzed and information of whether the valid data is marked by a decision value of a tail node of the whole fluctuation data.
Step S207: and determining the ((n-2) -1) th piece of data before the current piece of data as a data fluctuation first node.
In a specific implementation, when it is determined that the dynamic array is empty or whether the data fluctuation position where the last valid data stored in the dynamic array is located is a tail node, the ((n-2) -1) th data before the current piece of data can be determined as a data fluctuation head node. At this time, the following valid data model may be used to record and store the information of the head node in the dynamic array:
Figure BDA0001839265640000081
wherein N isiA sequence number, N, of a data fluctuation head node representing a whole fluctuation data segment corresponding to a current data segmentcSequence number, M, indicating the current piece of dataiAnd marking the judgment value of the data fluctuation tail node of the corresponding whole fluctuation data.
Step S208: judging whether the value obtained by subtracting (n-2) and adding (minimum continuous number-1) from the serial number of the current piece of data is close to the daily average value of the data or not; when the judgment result is yes, step S209 may be performed; otherwise, step S210 may be performed.
In a specific implementation, the minimum consecutive number is the number of consecutive "1" of the count threshold value from the highest bit excluding the sign bit to the lower bit in step S205; the daily average value of the data is a normal numerical value of the data of the system under the daily fluctuation-free condition.
Step S209: and determining the serial number of the current piece of data minus (n-2) plus (minimum continuous number-1) pieces of data as the tail node of the data fluctuation.
In a specific implementation, when the value of the serial number of the current piece of data minus (n-2) plus (minimum continuous number-1) is close to the daily average value of the data, the serial number of the previous piece of data minus (n-2) plus (minimum continuous number-1) can be determined as the tail node of the data fluctuation. At this time, the following valid data model may be used to record the information of the tail node and store the information in the dynamic array:
Figure BDA0001839265640000082
wherein N isiSequence number of data fluctuation tail node of whole fluctuation data corresponding to current data, NcSequence number, M, indicating the current piece of dataiAnd a judgment value mark of a data fluctuation tail node representing the corresponding whole fluctuation data, wherein minCon represents the minimum continuous number.
Step S210: and determining the (n-2) th data before the current data as the interim node of the data fluctuation and recording.
In a specific implementation, when the dynamic array is determined to be not empty or the data fluctuation position where the last valid data stored in the dynamic array is located is determined not to be a tail node, and the value obtained by subtracting (n-2) and adding (minimum continuous number-1) from the serial number of the current piece of data is not close to the daily average value of the data, determining that the (n-2) th piece of data before the current piece of data is a middle node of data fluctuation. At this time, the following valid data model can be used to record the information of the nodes in the period and store the information into the dynamic array:
Figure BDA0001839265640000091
wherein N isiIndicating the sequence number of the node in the data fluctuation period of the whole fluctuation data corresponding to the current piece of data, NcSequence number, M, indicating the current piece of dataiAnd marking the judgment value of the data fluctuation tail node of the corresponding whole fluctuation data.
Here, if the data is determined to be valid data, it is necessary to segment the valid data at the time of recording for the convenience of the subsequent data analysis. If the effective data is segmented, all the effective data serial numbers do not need to be recorded when the effective data is recorded during data screening, and only three data fluctuation nodes, namely a first node, a middle node and a tail node, need to be recorded, wherein:
the First Node indicates the start of data fluctuation.
The data meeting special conditions in the whole data fluctuation period, namely the Mid-term Node (Mid-term Node), shows the data with severe amplitude change in the data fluctuation.
The Last data of the whole data fluctuation indicates that the data fluctuation is finished.
There is another important reason for defining these three nodes in addition to the reason for valid data segmentation. One of the characteristics of the effective data screening method in the embodiment of the invention is that whether the current data is effective data or not can be judged according to the fluctuation change condition of the data within a period of time in the future, namely the 'forecasting' capability. In the actual implementation process, the function is to firstly "skip" the current data, and after traversing (n-1) pieces of data, according to the conditions of the data, return to judge the previous (n-2) th piece of data. This has the problem that there is a certain degree of "delay" in the data location determination.
This "delay" is offset by research and experimental testing. In the effective data model in the embodiment of the present invention, if the current effective data is determined as the head node, on the basis that the serial number of the previous (n-2) th data is obtained by subtracting (n-2) from the serial number of the current data, 1 is subtracted again to obtain the previous data of which the data starts to fluctuate, and the data is the real head node, that is, the first data of the whole data fluctuation; if the current effective data is judged to be the tail node, on the basis that the serial number of the previous (n-2) th data is obtained by subtracting (n-2) from the serial number of the current data, the previously defined minimum continuous number is added, the serial number of the effective data is continued for a section backwards to offset the tail end of the fluctuation after the fluctuation of the effective data reaches the end, so that the 'delay' caused by the invalidation of the data starting to be judged in advance is caused, then the 1 is subtracted to obtain the minimum continuous number after the fluctuation of the data starting to be changed minus 1 data, the data is the real tail node after the 'delay' is offset, namely the last data of the fluctuation of the whole section of data, meanwhile, the tail node is marked, and the subsequent sections are padded; if the current data is still valid data under the condition that the current data is judged to be not the first node and not the last node, the data is data with large variation amplitude in the whole data fluctuation, and the condition has no 'delay' phenomenon, so that the serial number of the previous (n-2) th data can be obtained by only subtracting (n-2) from the serial number of the current data.
Step S211: and shifting the count value of the recorder by X bits to the left and clearing the sign bit.
In specific implementation, the left shift of the recorder by X bits and the zero clearing of the sign bit of the recorder can be selected according to actual needs, such as determination of the weight condition of the data very close to the current data. Wherein X is 2MAnd M is an integer. In one embodiment of the present invention, X is 1.
Step S212: judging whether the traversal of the data set to be analyzed is finished; when the judgment result is no, step S213 may be performed; otherwise, the operation may end.
In a specific implementation, when it is determined that the traversal of the data set to be analyzed is completed, the operation may be directly ended, so as to obtain information of valid data in the data set to be analyzed.
Step S213: the next piece of data is acquired.
In a specific implementation, when it is determined that the traversal of the data set to be analyzed is not completed, the next piece of data may be obtained in sequence as the current piece of data traversed, and the execution is started from step S203 until all the traversals of the data set to be analyzed are completed.
The scheme in the embodiment of the invention has the following beneficial effects:
(1) the data screening result is accurate and reliable. For the identification of the relevance significance of mass data, the invention provides a complete judgment system based on a binary bit recorder. In order to search the marginal data hidden in mass data in a centralized manner and improve the accuracy and precision of a data analysis result obtained according to the data set, a binary digit is introduced as a recorder, the data fluctuation condition in a specific time period before and after the data needs to be judged is counted, and the whole process of judging the boundary condition of the data is dynamically grasped.
(2) High precision and low consumption. The binary bit recorder used by the invention can record large-range data information and only occupies a very small computer hardware storage space, so that marginal data influencing data precision is eliminated, and meanwhile, the consumption of computing resources and extra storage space is very small.
(3) May be embedded in other data analysis algorithms. All calculations of the algorithm are located in single data processing of traversal data, which means that the algorithm can be directly acted in other data screening or data analysis algorithms, matched with all other algorithms, and complementary with the other algorithms in terms of quality, namely, positioned as an auxiliary algorithm.
The embodiment of the invention also provides a computer-readable storage medium, wherein computer instructions are stored on the computer-readable storage medium, and the steps of the effective data screening method are executed when the computer instructions are executed. For the method for screening valid data, please refer to the introduction of the previous section, which is not described herein again.
The embodiment of the invention also provides a terminal, which comprises a memory and a processor, wherein the memory is stored with a computer instruction capable of running on the processor, and the processor executes the steps of the effective data screening method when running the computer instruction.
The effective data screening method in the implementation of the present invention is described in detail above, and the corresponding apparatus of the method will be described below.
Fig. 3 is a schematic structural diagram illustrating an effective data screening apparatus according to an embodiment of the present invention. Referring to fig. 3, a valid data screening apparatus 30 may include an obtaining unit 301 and a screening unit 302, wherein:
an obtaining unit 301 adapted to obtain a data set to be analyzed.
The screening unit 302 is adapted to traverse the data in the acquired data set to obtain a current data piece after traversal; judging whether the current data is changed too much compared with the previous data; when the current data is determined to be changed too much compared with the previous data, determining and recording the data fluctuation position of the corresponding whole fluctuation data based on the data in a period of time before and after the current data; and acquiring the next piece of data until the data set to be analyzed is completely traversed.
In an embodiment of the present invention, the screening unit 302 is adapted to calculate an absolute difference between the current piece of data and the previous piece of data, and compare the calculated absolute difference with a preset difference threshold to determine whether the current piece of data is changed too much from the previous piece of data.
Optionally, the screening unit 302 is adapted to increase a preset count value of the n-bit recorder by a preset value when it is determined that the current piece of data is changed too much compared with the previous piece of data; judging whether the current count value of the recorder is greater than a preset count threshold value or not; when the current count value of the recorder is determined to be larger than a preset count threshold value, acquiring the information of the fluctuation position of the last piece of data stored in a preset dynamic array; when the dynamic array is determined to be empty or the data fluctuation position where the last valid data stored in the dynamic array is located is a tail node, determining the ((n-2) -1) th piece of data before the current piece of data as a data fluctuation head node; when the value of subtracting (n-2) and adding (minimum continuous number-1) from the serial number of the current piece of data is determined to be close to the daily average value of the data, determining that the value of subtracting (n-2) and adding (minimum continuous number-1) from the serial number of the current piece of data is a tail node of data fluctuation; when the dynamic array is determined to be not empty or the data fluctuation position where the last valid data stored in the dynamic array is located is determined not to be a tail node, and the value obtained by subtracting (n-2) and adding (minimum continuous number-1) from the serial number of the current piece of data is not close to the daily average value of the data, determining that the (n-2) th piece of data before the current piece of data is an interim node of data fluctuation; shifting the recorder by X bit to the left, and clearing the sign bit of the recorder; x is an integer greater than or equal to 1 and less than n.
In an embodiment of the present invention, the screening unit 302 is adapted to, when it is determined that the ((n-2) -1) th piece of data before the current piece of data is the data fluctuation head node, record the data fluctuation position of the corresponding whole piece of fluctuation data by using the following array:
Figure BDA0001839265640000121
wherein N isiA sequence number, N, of a data fluctuation head node representing a whole fluctuation data segment corresponding to a current data segmentcSequence number, M, indicating the current piece of dataiData fluctuation representing corresponding whole-segment fluctuation dataAnd marking the judgment value of the tail node.
In an embodiment of the present invention, the screening unit 302 is adapted to, when it is determined that the (n-2) th piece of data before the current piece of data is an interim node of data fluctuation, record the data fluctuation position of the corresponding whole piece of fluctuation data by using the following array:
Figure BDA0001839265640000131
wherein N isiIndicating the sequence number of the node in the data fluctuation period of the whole fluctuation data corresponding to the current piece of data, NcSequence number, M, indicating the current piece of dataiAnd marking the judgment value of the data fluctuation tail node of the corresponding whole fluctuation data.
In an embodiment of the present invention, the screening unit 302 is adapted to, when determining that the sequence number of the current piece of data minus (n-2) plus (minimum consecutive number-1) pieces of data is a data fluctuation end node, record the data fluctuation position of the corresponding whole piece of fluctuation data by using the following array:
Figure BDA0001839265640000132
wherein N isiSequence number of data fluctuation tail node of whole fluctuation data corresponding to current data, NcSequence number, M, indicating the current piece of dataiAnd marking the judgment value of the data fluctuation tail node of the corresponding whole fluctuation data.
According to the scheme in the embodiment of the invention, the data in the acquired data set are traversed to obtain the traversed current data; when the current piece of data is determined to be changed too much compared with the previous piece of data, determining the current piece of data as valid data; when the current data is determined to be valid data, determining and recording the data fluctuation position of the corresponding whole fluctuation data; the next piece of data is obtained until the data set to be analyzed is completely traversed, the data fluctuation condition in a specific time period before and after the data can be judged, the whole process of judging the boundary condition of the data is dynamically grasped, and under the condition that the data precision and the final data analysis result precision are improved as much as possible, the system overhead in the aspects of computing resources and extra storage space brought by the data screening method is reduced as much as possible, so that the accuracy and the efficiency of effective data screening can be improved.
The foregoing shows and describes the general principles, essential features, and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are described in the foregoing description only for the purpose of illustrating the principles of the present invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the invention as defined by the appended claims, specification, and equivalents thereof.

Claims (10)

1. A method for screening useful data, comprising:
acquiring a data set to be analyzed;
traversing the acquired data in the data set to obtain a current traversed data;
judging whether the current data is changed too much compared with the previous data;
when the current data is determined to be changed too much compared with the previous data, determining and recording the data fluctuation position of the corresponding whole fluctuation data based on the data in a period of time before and after the current data;
and acquiring the next piece of data until the data set to be analyzed is completely traversed.
2. The method for screening valid data according to claim 1, wherein the determining whether the current piece of data has changed too much from the previous piece of data comprises:
and calculating the absolute difference value between the current piece of data and the previous piece of data, and comparing the calculated absolute difference value with a preset difference threshold value to judge whether the current piece of data is changed too much compared with the previous piece of data.
3. The method for screening effective data according to claim 1, wherein the determining and recording the data fluctuation position of the corresponding whole fluctuation data comprises:
when the current data is determined to be changed too much compared with the previous data, increasing a preset value by the count value of a preset n-bit recorder;
judging whether the current count value of the recorder is greater than a preset count threshold value or not;
when the current count value of the recorder is determined to be larger than a preset count threshold value, acquiring the information of the fluctuation position of the last piece of data stored in a preset dynamic array;
when the dynamic array is determined to be empty or the data fluctuation position of the last valid data stored in the dynamic array is determined to be a tail node, determining the (n-2-1) th data before the current data to be a data fluctuation head node;
when the numerical value of subtracting n-2 and the minimum continuous number-1 from the serial number of the current piece of data is determined to be close to the daily average value of the data, determining that the value of subtracting n-2 and the minimum continuous number-1 from the serial number of the current piece of data is a tail node of data fluctuation;
when the dynamic array is determined to be not empty or the data fluctuation position of the last valid data stored in the dynamic array is determined not to be a tail node, and the value obtained by subtracting n-2 from the serial number of the current piece of data and adding the minimum continuous number-1 is not close to the daily average value of the data, determining that the n-2 th piece of data before the current piece of data is a period node of data fluctuation;
shifting the recorder by X bit to the left, and clearing the sign bit of the recorder; x is an integer greater than or equal to 1 and less than n.
4. The effective data screening method according to claim 3, wherein when it is determined that the (n-2-1) th data before the current data is a data fluctuation head node, the data fluctuation position of the corresponding whole fluctuation data is recorded by using the following array:
Figure FDA0003210146870000021
wherein N isiA sequence number, N, of a data fluctuation head node representing a whole fluctuation data segment corresponding to a current data segmentcSequence number, M, indicating the current piece of dataiAnd marking the judgment value of the data fluctuation tail node of the corresponding whole fluctuation data.
5. The effective data screening method according to claim 3, wherein when the n-2 th data before the current data is determined as an interim node of data fluctuation, the data fluctuation position of the corresponding whole fluctuation data is recorded by using the following array:
Figure FDA0003210146870000022
wherein N isiIndicating the sequence number of the node in the data fluctuation period of the whole fluctuation data corresponding to the current piece of data, NcSequence number, M, indicating the current piece of dataiAnd marking the judgment value of the data fluctuation tail node of the corresponding whole fluctuation data.
6. The effective data screening method of claim 3, wherein when the sequence number of the current piece of data minus n-2 plus the minimum continuous number minus 1 is determined as the tail node of the data fluctuation, the data fluctuation position of the corresponding whole piece of fluctuation data is recorded by using the following array:
Figure FDA0003210146870000023
wherein N isiA sequence number, N, of a data fluctuation head node representing a whole fluctuation data segment corresponding to a current data segmentcSequence number, M, indicating the current piece of dataiAnd marking the judgment value of the data fluctuation tail node of the corresponding whole fluctuation data.
7. The method of claim 3, wherein X is 1.
8. The method of claim 3, wherein n is 32.
9. A computer readable storage medium having stored thereon computer instructions, wherein the computer instructions when executed perform the steps of the method for efficient data screening of any one of claims 1 to 8.
10. A terminal comprising a memory and a processor, the memory having stored thereon computer instructions capable of being executed on the processor, the processor when executing the computer instructions performing the steps of the method for screening of useful data according to any one of claims 1 to 8.
CN201811247433.1A 2018-10-24 2018-10-24 Effective data screening method, readable storage medium and terminal Active CN109542927B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811247433.1A CN109542927B (en) 2018-10-24 2018-10-24 Effective data screening method, readable storage medium and terminal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811247433.1A CN109542927B (en) 2018-10-24 2018-10-24 Effective data screening method, readable storage medium and terminal

Publications (2)

Publication Number Publication Date
CN109542927A CN109542927A (en) 2019-03-29
CN109542927B true CN109542927B (en) 2021-09-28

Family

ID=65844814

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811247433.1A Active CN109542927B (en) 2018-10-24 2018-10-24 Effective data screening method, readable storage medium and terminal

Country Status (1)

Country Link
CN (1) CN109542927B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115309753B (en) * 2022-10-11 2023-04-18 江苏泰洁检测技术股份有限公司 Data rapid reading method of efficient environment-friendly intelligent sample research and development system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105095482A (en) * 2015-08-13 2015-11-25 浪潮(北京)电子信息产业有限公司 Data mining method and system for detecting abnormal data interval
CN105468603A (en) * 2014-08-22 2016-04-06 腾讯科技(深圳)有限公司 Data selection method and apparatus
WO2016145049A1 (en) * 2015-03-09 2016-09-15 Vapor IO Inc. Rack for computing equipment
CN108120796A (en) * 2017-11-20 2018-06-05 太原鹏跃电子科技有限公司 A kind of detection railway accumulator CO32-When pH value catastrophe point measure and calculation method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE10327531B4 (en) * 2003-06-17 2006-11-30 Leica Microsystems Cms Gmbh Method for measuring fluorescence correlations in the presence of slow signal fluctuations

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105468603A (en) * 2014-08-22 2016-04-06 腾讯科技(深圳)有限公司 Data selection method and apparatus
WO2016145049A1 (en) * 2015-03-09 2016-09-15 Vapor IO Inc. Rack for computing equipment
CN105095482A (en) * 2015-08-13 2015-11-25 浪潮(北京)电子信息产业有限公司 Data mining method and system for detecting abnormal data interval
CN108120796A (en) * 2017-11-20 2018-06-05 太原鹏跃电子科技有限公司 A kind of detection railway accumulator CO32-When pH value catastrophe point measure and calculation method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"Basic methods of change-point detection of financial fluctuations";Hideki Takayasu;《2015 International Conference on Noise and Fluctuations (ICNF)》;20151005;全文 *
"波动量法中临界电流数据筛选策略的可行性研究";李洋等;《华北电力大学学报》;20120331;全文 *

Also Published As

Publication number Publication date
CN109542927A (en) 2019-03-29

Similar Documents

Publication Publication Date Title
CN111104091B (en) Detection and conversion method for precision specific calculation in dynamic floating point error analysis
CN108897842A (en) Computer readable storage medium and computer system
CN114817651B (en) Data storage method, data query method, device and equipment
CN109542927B (en) Effective data screening method, readable storage medium and terminal
CN110968564A (en) Data processing method and training method of data state prediction model
CN109522300B (en) Effective data screening device
WO2019136799A1 (en) Data discretisation method and apparatus, computer device and storage medium
CN110837555A (en) Method, equipment and storage medium for removing duplicate and screening of massive texts
CN110941730B (en) Retrieval method and device based on human face feature data migration
CN112149833B (en) Prediction method, device, equipment and storage medium based on machine learning
CN114021031A (en) Financial product information pushing method and device
CN112632337A (en) Element management method applied to firework filter and firework filter
CN112099759A (en) Numerical value processing method, device, processing equipment and computer readable storage medium
CN108431835B (en) Apparatus and method for determining length of correlation history
CN113221862B (en) Data filtering method and device, electronic equipment and storage medium
CN112765027B (en) Method for detecting redundant zero in application program execution process
CN116800637B (en) Method for estimating base number of data item in data stream and related equipment
CN111143744B (en) Method, device and equipment for detecting web asset and readable storage medium
CN114969913B (en) Method, system, equipment and medium for instantiating three-dimensional model component
CN117472975A (en) Data point query method, data point query device cluster, data point query program product and data point query storage medium
KR100321793B1 (en) Method for multi-phase category assignment on text categorization system
CN115617392B (en) Method and device for determining version number
CN111316257A (en) Graph structure data-based candidate item set support degree calculation method and application thereof
CN111008525B (en) Method and system for calculating attention
WO2010128574A1 (en) Determination device, determination method, and computer readable recording medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
EE01 Entry into force of recordation of patent licensing contract
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20190329

Assignee: SEU INTELLIGECE SYSTEM CO.,LTD.

Assignor: NANJING University OF POSTS AND TELECOMMUNICATIONS

Contract record no.: X2023980038683

Denomination of invention: Effective data filtering methods, readable storage media, and terminals

Granted publication date: 20210928

License type: Common License

Record date: 20230728