CN109542927B

CN109542927B - Effective data screening method, readable storage medium and terminal

Info

Publication number: CN109542927B
Application number: CN201811247433.1A
Authority: CN
Inventors: 徐小龙; 林皓伟
Original assignee: Nanjing University of Posts and Telecommunications
Current assignee: Nanjing University of Posts and Telecommunications
Priority date: 2018-10-24
Filing date: 2018-10-24
Publication date: 2021-09-28
Anticipated expiration: 2038-10-24
Also published as: CN109542927A

Abstract

An effective data screening method, a readable storage medium and a terminal, the method comprising: acquiring a data set to be analyzed; traversing the acquired data in the data set to obtain a current traversed data; judging whether the current data is changed too much compared with the previous data; when the current data is determined to be changed too much compared with the previous data, determining and recording the data fluctuation position of the corresponding whole fluctuation data based on the data in a period of time before and after the current data; and acquiring the next piece of data until the data set to be analyzed is completely traversed. By the scheme, the efficiency and the accuracy of effective data screening can be improved.

Description

Effective data screening method, readable storage medium and terminal

Technical Field

The invention belongs to the technical field of data analysis, and particularly relates to an effective data screening method, a readable storage medium and a terminal.

Background

Since 2012, the term "big data" is frequently introduced into people's visual field and is widely accepted and studied. These data, which are of ever-growing scale, hide the huge potential value behind them, and determine the direction and outcome of future development of many enterprises and various fields. Now, more and more enterprises are aware of the hidden danger caused by the explosive growth of data, and the importance of mass data to the enterprises is gradually paid attention to. Although the big data brings continuous business information and social value to people, the problem is obvious-the data volume in the current time is too large.

The huge amount of data in a big data environment causes a great amount of resources and time to be consumed for analyzing effective information, and daily mean data and marginal data of the big data environment are of great weight. In order to reduce the resources and time consumed by these calculations, in addition to designing a more excellent data analysis algorithm, it is also possible to start with the reduction of the data size.

Disclosure of Invention

The invention aims to solve the technical problem of how to improve the efficiency and the accuracy of effective data screening.

In order to achieve the above object, the present invention provides an effective data screening method, including:

acquiring a data set to be analyzed;

traversing the acquired data in the data set to obtain a current traversed data;

judging whether the current data is changed too much compared with the previous data;

when the current data is determined to be changed too much compared with the previous data, determining and recording the data fluctuation position of the corresponding whole fluctuation data based on the data in a period of time before and after the current data;

and acquiring the next piece of data until the data set to be analyzed is completely traversed.

Optionally, the determining whether the current piece of data has changed too much compared with the previous piece of data includes:

and calculating the absolute difference value between the current piece of data and the previous piece of data, and comparing the calculated absolute difference value with a preset difference threshold value to judge whether the current piece of data is changed too much compared with the previous piece of data.

Optionally, the determining a data fluctuation position of the corresponding whole fluctuation data includes:

when the current data is determined to be changed too much compared with the previous data, increasing a preset value by the count value of a preset n-bit recorder;

judging whether the current count value of the recorder is greater than a preset count threshold value or not;

when the current count value of the recorder is determined to be larger than a preset count threshold value, acquiring the information of the fluctuation position of the last piece of data stored in a preset dynamic array;

when the dynamic array is determined to be empty or the data fluctuation position where the last valid data stored in the dynamic array is located is a tail node, determining the ((n-2) -1) th piece of data before the current piece of data as a data fluctuation head node;

when the value of subtracting (n-2) and adding (minimum continuous number-1) from the serial number of the current piece of data is determined to be close to the daily average value of the data, determining that the value of subtracting (n-2) and adding (minimum continuous number-1) from the serial number of the current piece of data is a tail node of data fluctuation;

when the dynamic array is determined to be not empty or the data fluctuation position where the last valid data stored in the dynamic array is located is determined not to be a tail node, and the value obtained by subtracting (n-2) and adding (minimum continuous number-1) from the serial number of the current piece of data is not close to the daily average value of the data, determining that the (n-2) th piece of data before the current piece of data is an interim node of data fluctuation;

shifting the recorder by X bit to the left, and clearing the sign bit of the recorder; x is an integer greater than or equal to 1 and less than n.

Optionally, when the ((n-2) -1) th piece of data before the current piece of data is determined to be the data fluctuation head node, recording the data fluctuation position of the corresponding whole piece of fluctuation data by using the following array:

wherein N is_iA sequence number, N, of a data fluctuation head node representing a whole fluctuation data segment corresponding to a current data segment_cSequence number, M, indicating the current piece of data_iAnd marking the judgment value of the data fluctuation tail node of the corresponding whole fluctuation data.

Optionally, when the (n-2) th data before the current piece of data is determined to be an interim node of data fluctuation, recording the data fluctuation position of the corresponding whole piece of fluctuation data by using the following array:

wherein N is_iIndicating the sequence number of the node in the data fluctuation period of the whole fluctuation data corresponding to the current piece of data, N_cSequence number, M, indicating the current piece of data_iAnd marking the judgment value of the data fluctuation tail node of the corresponding whole fluctuation data.

Optionally, when the serial number of the current piece of data minus (n-2) plus (minimum continuous number-1) pieces of data is determined as the data fluctuation tail node, recording the data fluctuation position of the corresponding whole piece of fluctuation data by using the following array:

wherein N is_iSequence number of data fluctuation tail node of whole fluctuation data corresponding to current data, N_cSequence number, M, indicating the current piece of data_iAnd marking the judgment value of the data fluctuation tail node of the corresponding whole fluctuation data.

Optionally, X ═ 1.

Optionally, n-32.

The embodiment of the present invention further provides a computer-readable storage medium, on which computer instructions are stored, and when the computer instructions are executed, the method for screening effective data according to any one of the above-mentioned steps is performed.

The embodiment of the present invention further provides a terminal, which includes a memory and a processor, where the memory stores a computer instruction capable of running on the processor, and the processor executes the steps of any one of the above effective data screening methods when running the computer instruction.

Compared with the prior art, the invention has the beneficial effects that:

according to the scheme, the data in the acquired data set are traversed to obtain the traversed current piece of data; when the current data is determined to be changed too much compared with the previous data, determining and recording the data fluctuation position of the corresponding whole fluctuation data based on the data in a period of time before and after the current data; the next piece of data is obtained until the data set to be analyzed is completely traversed, the data fluctuation condition in a specific time period before and after the data can be judged, the whole process of judging the boundary condition of the data is dynamically grasped, and under the condition that the data precision and the final data analysis result precision are improved as much as possible, the system overhead in the aspects of computing resources and extra storage space brought by the data screening method is reduced as much as possible, so that the accuracy and the efficiency of effective data screening can be improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive labor.

FIG. 1 is a schematic flow chart of a method for efficient data screening in an embodiment of the present invention;

FIG. 2 is a schematic flow chart of another efficient data screening method in an embodiment of the present invention;

fig. 3 is a schematic structural diagram of an effective data screening apparatus in an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application. The directional indications (such as up, down, left, right, front, back, etc.) in the embodiments of the present invention are only used to explain the relative positional relationship between the components, the movement, etc. in a specific posture (as shown in the drawings), and if the specific posture is changed, the directional indication is changed accordingly.

As described in the background, one effective data screening method in the prior art is to mark data whose absolute difference from daily mean data is greater than a threshold as effective data. However, this method has the following problems:

(1) if the data generates small fluctuation in a very short time due to various conditions, but the small fluctuation data has no medium or large fluctuation or small fluctuation which meets the conditions and is worth analyzing within a certain time, the research significance of the small fluctuation data is low, the influence on the final data analysis result is low, and the data is regarded as marginal data. If the data scale of the large data environment is large, the scale amount of the marginal data is also huge, and a large share of computing resources and time are consumed.

(2) If the influence of the marginal data is considered and the screening of the marginal data is increased while traversing the data, the consumption of computing resources and the cost of additional storage resources in the computer are increased to a certain extent.

Therefore, the effective data screening method in the prior art has the problems of low accuracy and low efficiency.

In order to solve the above problems, in the technical solution of the embodiment of the present invention, when it is determined that the current piece of data has too large variation compared with the previous piece of data, the data fluctuation position of the corresponding whole piece of fluctuation data is determined and recorded based on data in a period of time before and after the current piece of data, the data fluctuation condition in a specific period of time before and after the data can be determined, the whole process of determining the boundary condition of the data can be dynamically grasped, and under the condition of improving the data accuracy and the final data analysis result accuracy as much as possible, the system overhead in terms of computing resources and extra storage space brought by the data screening method itself can be reduced as much as possible, so that the accuracy and efficiency of effective data screening can be improved.

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below.

The method comprises the steps of traversing each piece of data in a data set to be analyzed, and judging the data property of each piece of data according to the change condition of each piece of data in a period of time before and after. If the data is judged to be valid, the position of the data within the whole data fluctuation is judged. And according to different positions, carrying out different types of effective data marking operations. And finally, extracting all the effective data in the previous data set according to different marks of the effective data, and performing segmentation processing.

Fig. 1 is a schematic flow chart illustrating an effective data screening method according to an embodiment of the present invention. Referring to fig. 1, the effective data screening method in the embodiment of the present invention may specifically include the following steps:

step S101: a data set to be analyzed is acquired.

Step S102: and traversing the acquired data in the data set to obtain the traversed current piece of data.

In a specific implementation, the sequence of traversing the data in the acquired dataset may be performed according to actual analysis needs, and is not limited herein.

Step S103: judging whether the current data is changed too much compared with the previous data; when the judgment result is yes, step S104 may be performed; otherwise, step S106 may be performed.

In a specific implementation, when determining whether the current piece of data has an excessive change compared with the previous piece of data, an absolute difference between the current piece of data and the previous piece of data may be first calculated, the calculated absolute difference is compared with a preset difference threshold, and whether the current piece of data has an excessive change compared with the previous piece of data is determined according to a comparison result.

Step S104: and determining and recording the data fluctuation position of the corresponding whole fluctuation data based on the data in a period of time before and after the current piece of data.

In a specific implementation, when it is determined that the current piece of data has too much variation compared with the previous piece of data, the data fluctuation position of the whole corresponding piece of fluctuation data may be determined and recorded based on data in a period of time before and after the current piece of data, as described in detail in corresponding parts of fig. 2.

Step S106: judging whether the traversal of the data set to be analyzed is completed; when the judgment result is no, step S107 may be performed; otherwise, the operation may end.

Step S107: the next piece of data is acquired.

In a specific implementation, when it is determined that the traversal of the data set to be analyzed is not completed, the next piece of data of the current piece of data may be acquired as the traversed current piece of data, and the execution is continued from step S103 until all pieces of data in the data set to be analyzed are completely traversed.

The effective data screening method in the embodiment of the present invention will be described in further detail with reference to fig. 2.

Fig. 2 is a schematic flow chart illustrating an effective data screening method according to an embodiment of the present invention. Referring to fig. 2, the effective data screening method in the embodiment of the present invention may specifically include the following steps:

step S201: a data set to be analyzed is acquired.

Step S202: and traversing the acquired data in the data set to obtain the traversed current piece of data.

Step S203: judging whether the current data is changed too much compared with the previous data; when the judgment result is yes, step S204 may be performed; otherwise, step S205 may be performed.

Step S204: the preset n-bit counter value of the recorder is incremented by 1.

In specific implementation, the value of n can be set according to actual needs.

In one embodiment of the present invention, a binary bit is selected as the recorder. As to why the binary bits are chosen as recorders, the following considerations apply:

(1) in general algorithms, a number can only record one valid message. But if a binary digit number is used as a recorder, it can have a very large capacity at a very low cost. Taking the example of the integer, if one integer has 32 bits and the first flag bit is removed, it is possible to record a change of 31 bits in a memory space having such a small capacity, and the cost performance is extremely high.

(2) Based on the hardware operation consideration of a computer, a binary n-bit recorded with (n-1) bits is shifted to the left, which is equivalent to eliminating the influence of (n-2) th bit data before the current piece of data, and the change information of the remaining (n-2) pieces of data is endowed with different priorities again according to time and user evaluation conditions.

(3) Binary bits are a form of storage for hardware in a computer that also possesses decimal significance. And comparing the recorder of the binary bit with a special evaluation value obtained according to user evaluation, and obtaining whether the (n-2) th data from the current data is marginal data without specific data fluctuation within a time range before and after according to a judgment result.

Of course, those skilled in the art may also employ non-binary bit recorders, which are not limited herein.

Step S205: judging whether the current count value of the recorder is greater than a preset count threshold value or not; when the judgment result is yes, step S206 may be performed; otherwise, step S208 may be performed.

In a specific implementation, the preset count threshold may be set according to actual needs. For example, when the recorder bit number n is 32, the count threshold is set to 28672, 7000 in hexadecimal form.

Step S206: acquiring information of a data fluctuation position stored in a preset dynamic array, and judging whether the dynamic array is empty or whether a data fluctuation position of last effective data stored in the dynamic array is a tail node; when the judgment result is yes, step S207 may be performed; otherwise, step S208 may be performed.

In specific implementation, the dynamic array is used for recording the bit sequence of valid data in the analyzed data in the data set to be analyzed and information of whether the valid data is marked by a decision value of a tail node of the whole fluctuation data.

Step S207: and determining the ((n-2) -1) th piece of data before the current piece of data as a data fluctuation first node.

In a specific implementation, when it is determined that the dynamic array is empty or whether the data fluctuation position where the last valid data stored in the dynamic array is located is a tail node, the ((n-2) -1) th data before the current piece of data can be determined as a data fluctuation head node. At this time, the following valid data model may be used to record and store the information of the head node in the dynamic array:

Step S208: judging whether the value obtained by subtracting (n-2) and adding (minimum continuous number-1) from the serial number of the current piece of data is close to the daily average value of the data or not; when the judgment result is yes, step S209 may be performed; otherwise, step S210 may be performed.

In a specific implementation, the minimum consecutive number is the number of consecutive "1" of the count threshold value from the highest bit excluding the sign bit to the lower bit in step S205; the daily average value of the data is a normal numerical value of the data of the system under the daily fluctuation-free condition.

Step S209: and determining the serial number of the current piece of data minus (n-2) plus (minimum continuous number-1) pieces of data as the tail node of the data fluctuation.

In a specific implementation, when the value of the serial number of the current piece of data minus (n-2) plus (minimum continuous number-1) is close to the daily average value of the data, the serial number of the previous piece of data minus (n-2) plus (minimum continuous number-1) can be determined as the tail node of the data fluctuation. At this time, the following valid data model may be used to record the information of the tail node and store the information in the dynamic array:

wherein N is_iSequence number of data fluctuation tail node of whole fluctuation data corresponding to current data, N_cSequence number, M, indicating the current piece of data_iAnd a judgment value mark of a data fluctuation tail node representing the corresponding whole fluctuation data, wherein minCon represents the minimum continuous number.

Step S210: and determining the (n-2) th data before the current data as the interim node of the data fluctuation and recording.

In a specific implementation, when the dynamic array is determined to be not empty or the data fluctuation position where the last valid data stored in the dynamic array is located is determined not to be a tail node, and the value obtained by subtracting (n-2) and adding (minimum continuous number-1) from the serial number of the current piece of data is not close to the daily average value of the data, determining that the (n-2) th piece of data before the current piece of data is a middle node of data fluctuation. At this time, the following valid data model can be used to record the information of the nodes in the period and store the information into the dynamic array:

Here, if the data is determined to be valid data, it is necessary to segment the valid data at the time of recording for the convenience of the subsequent data analysis. If the effective data is segmented, all the effective data serial numbers do not need to be recorded when the effective data is recorded during data screening, and only three data fluctuation nodes, namely a first node, a middle node and a tail node, need to be recorded, wherein:

the First Node indicates the start of data fluctuation.

The data meeting special conditions in the whole data fluctuation period, namely the Mid-term Node (Mid-term Node), shows the data with severe amplitude change in the data fluctuation.

The Last data of the whole data fluctuation indicates that the data fluctuation is finished.

There is another important reason for defining these three nodes in addition to the reason for valid data segmentation. One of the characteristics of the effective data screening method in the embodiment of the invention is that whether the current data is effective data or not can be judged according to the fluctuation change condition of the data within a period of time in the future, namely the 'forecasting' capability. In the actual implementation process, the function is to firstly "skip" the current data, and after traversing (n-1) pieces of data, according to the conditions of the data, return to judge the previous (n-2) th piece of data. This has the problem that there is a certain degree of "delay" in the data location determination.

This "delay" is offset by research and experimental testing. In the effective data model in the embodiment of the present invention, if the current effective data is determined as the head node, on the basis that the serial number of the previous (n-2) th data is obtained by subtracting (n-2) from the serial number of the current data, 1 is subtracted again to obtain the previous data of which the data starts to fluctuate, and the data is the real head node, that is, the first data of the whole data fluctuation; if the current effective data is judged to be the tail node, on the basis that the serial number of the previous (n-2) th data is obtained by subtracting (n-2) from the serial number of the current data, the previously defined minimum continuous number is added, the serial number of the effective data is continued for a section backwards to offset the tail end of the fluctuation after the fluctuation of the effective data reaches the end, so that the 'delay' caused by the invalidation of the data starting to be judged in advance is caused, then the 1 is subtracted to obtain the minimum continuous number after the fluctuation of the data starting to be changed minus 1 data, the data is the real tail node after the 'delay' is offset, namely the last data of the fluctuation of the whole section of data, meanwhile, the tail node is marked, and the subsequent sections are padded; if the current data is still valid data under the condition that the current data is judged to be not the first node and not the last node, the data is data with large variation amplitude in the whole data fluctuation, and the condition has no 'delay' phenomenon, so that the serial number of the previous (n-2) th data can be obtained by only subtracting (n-2) from the serial number of the current data.

Step S211: and shifting the count value of the recorder by X bits to the left and clearing the sign bit.

In specific implementation, the left shift of the recorder by X bits and the zero clearing of the sign bit of the recorder can be selected according to actual needs, such as determination of the weight condition of the data very close to the current data. Wherein X is 2^MAnd M is an integer. In one embodiment of the present invention, X is 1.

Step S212: judging whether the traversal of the data set to be analyzed is finished; when the judgment result is no, step S213 may be performed; otherwise, the operation may end.

In a specific implementation, when it is determined that the traversal of the data set to be analyzed is completed, the operation may be directly ended, so as to obtain information of valid data in the data set to be analyzed.

Step S213: the next piece of data is acquired.

In a specific implementation, when it is determined that the traversal of the data set to be analyzed is not completed, the next piece of data may be obtained in sequence as the current piece of data traversed, and the execution is started from step S203 until all the traversals of the data set to be analyzed are completed.

The scheme in the embodiment of the invention has the following beneficial effects:

(1) the data screening result is accurate and reliable. For the identification of the relevance significance of mass data, the invention provides a complete judgment system based on a binary bit recorder. In order to search the marginal data hidden in mass data in a centralized manner and improve the accuracy and precision of a data analysis result obtained according to the data set, a binary digit is introduced as a recorder, the data fluctuation condition in a specific time period before and after the data needs to be judged is counted, and the whole process of judging the boundary condition of the data is dynamically grasped.

(2) High precision and low consumption. The binary bit recorder used by the invention can record large-range data information and only occupies a very small computer hardware storage space, so that marginal data influencing data precision is eliminated, and meanwhile, the consumption of computing resources and extra storage space is very small.

(3) May be embedded in other data analysis algorithms. All calculations of the algorithm are located in single data processing of traversal data, which means that the algorithm can be directly acted in other data screening or data analysis algorithms, matched with all other algorithms, and complementary with the other algorithms in terms of quality, namely, positioned as an auxiliary algorithm.

The embodiment of the invention also provides a computer-readable storage medium, wherein computer instructions are stored on the computer-readable storage medium, and the steps of the effective data screening method are executed when the computer instructions are executed. For the method for screening valid data, please refer to the introduction of the previous section, which is not described herein again.

The embodiment of the invention also provides a terminal, which comprises a memory and a processor, wherein the memory is stored with a computer instruction capable of running on the processor, and the processor executes the steps of the effective data screening method when running the computer instruction.

The effective data screening method in the implementation of the present invention is described in detail above, and the corresponding apparatus of the method will be described below.

Fig. 3 is a schematic structural diagram illustrating an effective data screening apparatus according to an embodiment of the present invention. Referring to fig. 3, a valid data screening apparatus 30 may include an obtaining unit 301 and a screening unit 302, wherein:

an obtaining unit 301 adapted to obtain a data set to be analyzed.

The screening unit 302 is adapted to traverse the data in the acquired data set to obtain a current data piece after traversal; judging whether the current data is changed too much compared with the previous data; when the current data is determined to be changed too much compared with the previous data, determining and recording the data fluctuation position of the corresponding whole fluctuation data based on the data in a period of time before and after the current data; and acquiring the next piece of data until the data set to be analyzed is completely traversed.

In an embodiment of the present invention, the screening unit 302 is adapted to calculate an absolute difference between the current piece of data and the previous piece of data, and compare the calculated absolute difference with a preset difference threshold to determine whether the current piece of data is changed too much from the previous piece of data.

Optionally, the screening unit 302 is adapted to increase a preset count value of the n-bit recorder by a preset value when it is determined that the current piece of data is changed too much compared with the previous piece of data; judging whether the current count value of the recorder is greater than a preset count threshold value or not; when the current count value of the recorder is determined to be larger than a preset count threshold value, acquiring the information of the fluctuation position of the last piece of data stored in a preset dynamic array; when the dynamic array is determined to be empty or the data fluctuation position where the last valid data stored in the dynamic array is located is a tail node, determining the ((n-2) -1) th piece of data before the current piece of data as a data fluctuation head node; when the value of subtracting (n-2) and adding (minimum continuous number-1) from the serial number of the current piece of data is determined to be close to the daily average value of the data, determining that the value of subtracting (n-2) and adding (minimum continuous number-1) from the serial number of the current piece of data is a tail node of data fluctuation; when the dynamic array is determined to be not empty or the data fluctuation position where the last valid data stored in the dynamic array is located is determined not to be a tail node, and the value obtained by subtracting (n-2) and adding (minimum continuous number-1) from the serial number of the current piece of data is not close to the daily average value of the data, determining that the (n-2) th piece of data before the current piece of data is an interim node of data fluctuation; shifting the recorder by X bit to the left, and clearing the sign bit of the recorder; x is an integer greater than or equal to 1 and less than n.

In an embodiment of the present invention, the screening unit 302 is adapted to, when it is determined that the ((n-2) -1) th piece of data before the current piece of data is the data fluctuation head node, record the data fluctuation position of the corresponding whole piece of fluctuation data by using the following array:

wherein N is_iA sequence number, N, of a data fluctuation head node representing a whole fluctuation data segment corresponding to a current data segment_cSequence number, M, indicating the current piece of data_iData fluctuation representing corresponding whole-segment fluctuation dataAnd marking the judgment value of the tail node.

In an embodiment of the present invention, the screening unit 302 is adapted to, when it is determined that the (n-2) th piece of data before the current piece of data is an interim node of data fluctuation, record the data fluctuation position of the corresponding whole piece of fluctuation data by using the following array:

In an embodiment of the present invention, the screening unit 302 is adapted to, when determining that the sequence number of the current piece of data minus (n-2) plus (minimum consecutive number-1) pieces of data is a data fluctuation end node, record the data fluctuation position of the corresponding whole piece of fluctuation data by using the following array:

According to the scheme in the embodiment of the invention, the data in the acquired data set are traversed to obtain the traversed current data; when the current piece of data is determined to be changed too much compared with the previous piece of data, determining the current piece of data as valid data; when the current data is determined to be valid data, determining and recording the data fluctuation position of the corresponding whole fluctuation data; the next piece of data is obtained until the data set to be analyzed is completely traversed, the data fluctuation condition in a specific time period before and after the data can be judged, the whole process of judging the boundary condition of the data is dynamically grasped, and under the condition that the data precision and the final data analysis result precision are improved as much as possible, the system overhead in the aspects of computing resources and extra storage space brought by the data screening method is reduced as much as possible, so that the accuracy and the efficiency of effective data screening can be improved.

The foregoing shows and describes the general principles, essential features, and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are described in the foregoing description only for the purpose of illustrating the principles of the present invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the invention as defined by the appended claims, specification, and equivalents thereof.

Claims

1. A method for screening useful data, comprising:

acquiring a data set to be analyzed;

2. The method for screening valid data according to claim 1, wherein the determining whether the current piece of data has changed too much from the previous piece of data comprises:

3. The method for screening effective data according to claim 1, wherein the determining and recording the data fluctuation position of the corresponding whole fluctuation data comprises:

when the dynamic array is determined to be empty or the data fluctuation position of the last valid data stored in the dynamic array is determined to be a tail node, determining the (n-2-1) th data before the current data to be a data fluctuation head node;

when the numerical value of subtracting n-2 and the minimum continuous number-1 from the serial number of the current piece of data is determined to be close to the daily average value of the data, determining that the value of subtracting n-2 and the minimum continuous number-1 from the serial number of the current piece of data is a tail node of data fluctuation;

when the dynamic array is determined to be not empty or the data fluctuation position of the last valid data stored in the dynamic array is determined not to be a tail node, and the value obtained by subtracting n-2 from the serial number of the current piece of data and adding the minimum continuous number-1 is not close to the daily average value of the data, determining that the n-2 th piece of data before the current piece of data is a period node of data fluctuation;

4. The effective data screening method according to claim 3, wherein when it is determined that the (n-2-1) th data before the current data is a data fluctuation head node, the data fluctuation position of the corresponding whole fluctuation data is recorded by using the following array:

5. The effective data screening method according to claim 3, wherein when the n-2 th data before the current data is determined as an interim node of data fluctuation, the data fluctuation position of the corresponding whole fluctuation data is recorded by using the following array:

6. The effective data screening method of claim 3, wherein when the sequence number of the current piece of data minus n-2 plus the minimum continuous number minus 1 is determined as the tail node of the data fluctuation, the data fluctuation position of the corresponding whole piece of fluctuation data is recorded by using the following array:

7. The method of claim 3, wherein X is 1.

8. The method of claim 3, wherein n is 32.

9. A computer readable storage medium having stored thereon computer instructions, wherein the computer instructions when executed perform the steps of the method for efficient data screening of any one of claims 1 to 8.

10. A terminal comprising a memory and a processor, the memory having stored thereon computer instructions capable of being executed on the processor, the processor when executing the computer instructions performing the steps of the method for screening of useful data according to any one of claims 1 to 8.