CN114385463A - Data acquisition method and device and electronic equipment - Google Patents
Data acquisition method and device and electronic equipment Download PDFInfo
- Publication number
- CN114385463A CN114385463A CN202111498617.7A CN202111498617A CN114385463A CN 114385463 A CN114385463 A CN 114385463A CN 202111498617 A CN202111498617 A CN 202111498617A CN 114385463 A CN114385463 A CN 114385463A
- Authority
- CN
- China
- Prior art keywords
- data
- acquisition
- processed
- similarity
- historical
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3466—Performance evaluation by tracing or monitoring
- G06F11/3476—Data logging
Landscapes
- Engineering & Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computer Hardware Design (AREA)
- Quality & Reliability (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment of the application discloses a data acquisition method, a data acquisition device and electronic equipment. Wherein, the method comprises the following steps: acquiring acquired data acquired in the current acquisition period as data to be processed in response to the acquisition instruction; calculating the similarity between the data to be processed and historical acquisition data acquired in a historical acquisition period; and if the similarity meets the specified threshold condition, storing the data to be processed. By the method, the similarity between the data to be processed obtained through calculation and the historical acquisition data acquired in the historical acquisition period can be compared with the specified threshold condition, and the data to be processed is stored under the condition that the similarity meets the specified threshold condition, so that the acquired acquisition data can be screened to a certain extent according to the specified threshold condition to obtain the acquisition data meeting the requirement (the specified threshold condition) for storage, the acquired acquisition data acquired each time does not need to be stored directly, and the storage space is saved.
Description
Technical Field
The present application relates to the field of computer technologies, and in particular, to a data acquisition method and apparatus, and an electronic device.
Background
In order to know the operation state of the device or the program, data of the device or the program in the operation process can be collected, and then whether the device or the program has a fault or not can be determined according to the collected data. However, the related data acquisition mode also has the problem of waste of storage space.
Disclosure of Invention
In view of the above problems, the present application provides a data acquisition method, an apparatus and an electronic device to improve the above problems.
In a first aspect, the present application provides a data acquisition method, including: acquiring acquired data acquired in the current acquisition period as data to be processed in response to the acquisition instruction; calculating the similarity between the data to be processed and historical acquisition data acquired in a historical acquisition period; and if the similarity meets the condition of a specified threshold, storing the data to be processed.
Optionally, the calculating the similarity between the collected data and the historical collected data collected in the historical collection period includes: and calculating the similarity between the data to be processed and historical acquisition data acquired in a historical acquisition period based on the type of the data to be processed.
Optionally, the calculating the similarity between the data to be processed and the historical acquisition data acquired in the historical acquisition period based on the type of the data to be processed includes: if the data to be processed is accumulation type data, calculating similarity based on the data to be processed and the change rate of historical acquisition data acquired in a historical acquisition period; and if the data to be processed is non-accumulative data, calculating similarity based on the data to be processed and historical acquisition data acquired in a historical acquisition period.
Optionally, the calculating the similarity based on the to-be-processed data and the change rate of the historical acquisition data acquired in the historical acquisition period includes: if the data to be processed is single-dimensional data, acquiring historical acquisition data acquired by a plurality of historical acquisition cycles respectively to obtain a plurality of historical acquisition data; acquiring a plurality of pairs of data from the data to be processed and the plurality of historical acquisition data, wherein the acquisition periods corresponding to the data in each pair of data are adjacent; acquiring the difference between the data after the corresponding acquisition period and the data before the corresponding acquisition period in each pair of data as a reference difference value to obtain the reference difference value corresponding to each pair of data; comparing the reference difference value corresponding to each pair of data with the corresponding acquisition period in each pair of data to obtain the corresponding change rate of each pair of data; and calculating the similarity based on a first similarity algorithm and the change rate corresponding to each pair of data, wherein the first similarity algorithm comprises any one of standard deviation, Euclidean distance, cosine distance, Pearson correlation coefficient, modified cosine distance, Hamming distance and Manhattan distance.
Optionally, the calculating the similarity based on the to-be-processed data and the change rate of the historical acquisition data acquired in the historical acquisition period includes: if the data to be processed is multidimensional data, acquiring historical acquisition data acquired by a plurality of historical acquisition cycles respectively to obtain a plurality of historical acquisition data; acquiring the data to be processed and the data of each dimension position in the plurality of historical acquisition data; dividing the data of each dimension position into a plurality of groups based on the corresponding dimension position to obtain a plurality of groups of data, wherein the corresponding dimension positions of the same group of data are the same; acquiring a plurality of pairs of data in each group of data, wherein the corresponding acquisition periods of the data in each pair of data are adjacent; acquiring the difference between the data after the corresponding acquisition period and the data before the corresponding acquisition period in each pair of data as a reference difference value to obtain the reference difference value corresponding to each pair of data in each group of data; comparing the reference difference value corresponding to each pair of data with the corresponding acquisition period in each pair of data to obtain the corresponding change rate of each pair of data; dividing a plurality of pairs of data into a plurality of sets based on the sampling period of the included data, wherein the corresponding acquisition periods of each pair of data in the same set are the same; generating corresponding multidimensional data based on the change rate corresponding to each data in each set to obtain a plurality of multidimensional data, wherein the dimension position of the change rate corresponding to each pair of data in the correspondingly generated multidimensional data is the same as the dimension position of the data in the pair of data in the data to be processed or the historical acquisition data; and calculating similarity based on a second similarity algorithm and the multi-dimensional data, wherein the second similarity algorithm comprises any one item of Euclidean distance, cosine distance, Pearson correlation coefficient, modified cosine distance, Hamming distance and Manhattan distance.
Optionally, the calculating the similarity based on the data to be processed and the historical acquisition data acquired in the historical acquisition period includes: if the data to be processed is single-dimensional data, acquiring historical acquisition data acquired by a plurality of historical acquisition cycles respectively to obtain a plurality of historical acquisition data; calculating similarity based on a first similarity algorithm, the data to be processed and the historical collected data, wherein the first similarity algorithm comprises any one of standard deviation, Euclidean distance, cosine distance, Pearson correlation coefficient, modified cosine distance, Hamming distance and Manhattan distance.
Optionally, the calculating the similarity based on the data to be processed and the historical acquisition data acquired in the historical acquisition period includes: if the data to be processed is multidimensional data, acquiring historical acquisition data acquired by a plurality of historical acquisition cycles respectively to obtain a plurality of historical acquisition data; and calculating similarity based on a second similarity algorithm, the data to be processed and the plurality of historical acquisition data, wherein the second similarity algorithm comprises any one of Euclidean distance, cosine distance, Pearson correlation coefficient, modified cosine distance, Hamming distance and Manhattan distance.
Thereby, when the data to be processed is accumulation type data, the similarity can be calculated based on the data to be processed and the change rate of a plurality of historical collected data; when the data to be processed is non-accumulative data, similarity calculation can be performed on the basis of the data to be processed and a plurality of historical collected data, so that the similarity calculation can be performed on the basis of different data types by adopting different data due to the fact that data characteristics of different data types are different, and whether the data to be processed has data characteristics with high value or not can be better identified. And the accumulated/non-accumulated data is further divided into single-dimensional data and multi-dimensional data, and similarity calculation is performed by adopting different similarity algorithms according to different data dimensions, so that the applicability and the expansibility of the data acquisition method provided by the application are improved.
Optionally, the calculating the similarity between the data to be processed and the historical acquisition data acquired in the historical acquisition period includes: and calculating the similarity between the data to be processed and historical acquisition data acquired in a historical acquisition period based on the dimensionality of the data to be processed.
Optionally, if the similarity does not meet the specified threshold condition and the acquisition time of the to-be-processed data is matched with a persistent storage period, storing the to-be-processed data.
Data is collected and stored based on a persistence storage period, wherein the persistence storage period is greater than the sampling period.
Optionally, the persistent storage period is determined based on the type of the data to be processed.
By the method, the data to be processed can be stored when the acquisition time is matched with the persistence storage period or the similarity can meet the specified threshold condition, the data to be processed can be stored when the persistence storage period arrives, and whether the similarity meets the specified threshold condition or not is random, so that the data to be processed can be subjected to variable-frequency persistence storage, the data to be processed with higher value and the conventional data to be processed can be stored persistently, and the safety and the stability of the equipment are improved while the storage space is saved.
Optionally, the specified threshold condition includes that the similarity is smaller than a first similarity threshold, or an absolute value of a difference between the similarity and the similarity corresponding to the previous acquisition cycle is smaller than a second similarity threshold.
By the method, when the problem caused by the equipment failure represented by the data to be processed is serious (for example, the equipment is directly stopped), the specified threshold condition corresponding to the data to be processed can be set as: the absolute value of the difference value between the similarity of the data to be processed and the similarity corresponding to the previous acquisition cycle is smaller than a second similarity threshold value, so that equipment faults can be found in time; when the problem caused by the failure of the device represented by the data to be processed is light (for example, a certain function of the device fails, but other tasks can be executed), the specified threshold condition corresponding to the data to be processed can be set as: the similarity of the data to be processed is smaller than a first similarity threshold value, so that the running state of the equipment can be analyzed quickly. Therefore, the specified threshold condition can be determined based on the actual requirement, and the flexibility of the similarity judgment method is improved.
In a second aspect, the present application provides a data acquisition method and apparatus, which are run on an electronic device, and the apparatus includes: the to-be-processed data acquisition unit is used for responding to the acquisition instruction and acquiring the acquired data acquired in the current acquisition period as to-be-processed data; the similarity calculation unit is used for calculating the similarity between the data to be processed and historical acquisition data acquired in a historical acquisition period; and the storage unit is used for storing the data to be processed if the similarity meets a specified threshold condition.
In a third aspect, the present application provides an electronic device comprising one or more processors and a memory; one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to perform the methods described above.
In a fourth aspect, the present application provides a computer-readable storage medium having a program code stored therein, wherein the program code performs the above method when running.
According to the data acquisition method, the data acquisition device, the electronic equipment and the storage medium, after the acquisition instruction is responded, the acquisition data acquired in the current acquisition period are acquired, the acquisition data acquired in the current acquisition period are used as the data to be processed, the similarity between the data to be processed and the historical acquisition data acquired in the historical acquisition period is calculated, and if the similarity meets the specified threshold condition, the data to be processed are stored. Therefore, by the method, the similarity between the data to be processed obtained through calculation and the historical acquisition data acquired in the historical acquisition period can be compared with the specified threshold condition, and the data to be processed is stored under the condition that the similarity meets the specified threshold condition, so that the acquired acquisition data can be screened to a certain extent according to the specified threshold condition to obtain the acquisition data meeting the requirement (the specified threshold condition) for storage, the acquired acquisition data obtained each time do not need to be directly stored, and the storage space is saved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic view illustrating an application scenario of a data acquisition method proposed in the present application;
fig. 2 is a flowchart illustrating a data acquisition method according to an embodiment of the present application;
fig. 3 shows a flow chart of a data acquisition method according to another embodiment of the present application;
FIG. 4 is a flow chart illustrating an embodiment of the present application at S220 of FIG. 2;
FIG. 5 is a flow chart illustrating another embodiment of the present application at S220 of FIG. 2;
FIG. 6 is a diagram illustrating a multi-dimensional data change rate calculation method proposed in the present application;
FIG. 7 is a flowchart illustrating an embodiment of S230 of FIG. 2;
FIG. 8 is a flow chart illustrating another embodiment of the present application at S230 of FIG. 2;
fig. 9 is a flow chart illustrating a data collection method according to still another embodiment of the present application;
FIG. 10 is a flow chart illustrating a method for data acquisition according to yet another embodiment of the present application;
FIG. 11 is a schematic diagram illustrating a flow of a data collection method proposed in the present application;
fig. 12 is a block diagram showing a structure of a data acquisition apparatus according to an embodiment of the present application;
fig. 13 is a block diagram illustrating an electronic device according to the present application;
fig. 14 is a storage unit for storing or carrying program codes for implementing a parameter obtaining method according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
In order to know the operation state of the device or the program, data of the device or the program in the operation process can be collected, and then whether the device or the program has a fault or not can be determined according to the collected data. For example, data related to the read-write rate of the disk I/O port may be collected to determine whether a data read-write failure occurs on the disk according to the collected related data.
The inventor finds that the related data acquisition mode has the problems of waste of storage space or incapability of acquiring high-value data. For example: in the method of collecting the device data in full, a large amount of data needs to be collected, so that a large amount of local storage space is occupied, and most of the stored data is repeated low-value data.
Therefore, the inventor proposes a data acquisition method, a data acquisition device and an electronic device in the application, wherein after acquiring acquisition data acquired in a current acquisition cycle in response to an acquisition instruction, the acquisition data acquired in the current acquisition cycle is used as data to be processed, the similarity between the data to be processed and historical acquisition data acquired in a historical acquisition cycle is calculated, and if the similarity meets a specified threshold condition, the data to be processed is stored. Therefore, by the method, the similarity between the data to be processed obtained through calculation and the historical acquisition data acquired in the historical acquisition period can be compared with the specified threshold condition, and the data to be processed is stored under the condition that the similarity meets the specified threshold condition, so that the acquired acquisition data can be screened to a certain extent according to the specified threshold condition to obtain the acquisition data meeting the requirement (the specified threshold condition) for storage, the acquired acquisition data obtained each time do not need to be directly stored, and the storage space is saved.
Scene: the data acquisition method comprises the following steps that a plurality of devices (examples can be given), a network and a cloud server, the plurality of devices can acquire respective operation data and upload the operation data to the cloud server (the devices can be response type uploading or active uploading), and the cloud server executes the data acquisition method provided by the embodiment of the application.
In order to better understand the solution of the embodiment of the present application, an application scenario related to the embodiment of the present application is described below.
Referring to fig. 1, the scenario shown in fig. 1 includes a plurality of devices, a gateway, a cloud platform, and a user terminal. The device can receive a control command issued by the cloud and the gateway, report information corresponding to the control command through the gateway after responding to the control command (for example, the device can report operation data corresponding to the device through the gateway after responding to a data acquisition command issued by the cloud or the gateway), and also can upload operation data corresponding to the device through the gateway, so that the gateway and the cloud platform can determine whether the device fails; the gateway can be responsible for processing uplink information of the Internet of things equipment and downlink commands of the cloud; the cloud platform can be used for executing the data acquisition method provided by the embodiment of the application, and can also provide an interface for calling an application program in user terminal equipment to realize uplink and downlink of information, so that when data characterization equipment acquired by the cloud platform possibly fails, prompt information can be sent to the application program of the user terminal; the application program of the user terminal equipment can send the control command to the equipment and read the operation data reported by the equipment by calling the interface provided by the cloud platform, so that troubleshooting and maintenance of the equipment of the internet of things can be performed based on the operation data.
Embodiments to which the present application relates will be described below with reference to the accompanying drawings.
Referring to fig. 2, a data acquisition method provided in the present application includes:
s110: and responding to the acquisition instruction, and acquiring the acquired data acquired in the current acquisition period as the data to be processed.
The data to be processed may be data representing the state of the device (e.g., a disk, etc.). For example, when the device is a disk, the data to be processed may be read times per second (r/s), write times per second (w/s), Wear Leveling Count (Wear Leveling Count), and the like, where r/s and w/s may represent whether a function of reading and writing data at a disk I/O port is normal, and the Wear Leveling Count may represent whether a function of storing the disk is normal.
As a mode, when a time corresponding to the acquisition period arrives, the control unit of the electronic device may send an acquisition instruction to the data acquisition unit, where the acquisition instruction may include an identifier of data to be acquired, and after the data acquisition unit responds to the acquisition instruction, the data acquisition unit may perform data acquisition operation according to the identifier of the data to be acquired, so as to use the acquisition data acquired in the current acquisition period as data to be processed. For example, the data identifier to be acquired may be the number of reads completed per second (r/s), the sampling period may be t, and at intervals of time t, the control unit of the electronic device may send an acquisition instruction to the data acquisition unit, and after the data acquisition unit responds to the acquisition instruction, the data acquisition unit may perform data acquisition operation according to the data identifier to be acquired, so as to use the number of reads completed per second (r/s) acquired in the current acquisition period as the data to be processed.
It should be noted that one acquisition instruction may include one or more identifiers of data to be acquired, and the number of the identifiers of the data to be acquired included in the acquisition instruction may be determined based on task requirements of the electronic device.
Alternatively, as shown in table 1, the identifiers of the data may be encoded, and each code may correspond to a non-repeating identifier.
TABLE 1
Encoding | Identification of data |
001 | Number of reads completed per second |
002 | Number of writes completed per second |
... | ... |
S120: and calculating the similarity between the data to be processed and historical acquisition data acquired in a historical acquisition period.
The similarity represents the similarity degree between the data to be processed and the historical collected data, and the larger the similarity is, the larger the corresponding similarity degree is. As one way, the similarity of the data to be processed and the historical acquisition data acquired in the historical acquisition period may be calculated based on the type of the data to be processed.
In the embodiment of the present application, the data to be processed can be divided into accumulation type data and non-accumulation type data. The cumulative data is data that can generate an additive effect with the historical data, that is, the cumulative data can be data related to the historical state and the current state of the device. For example: when the device is a disk, the accumulated data may be a relocation Sector Count (reloaded _ Sector _ Ct), a terminal check error (End-to-End error), a relocation Event Count (reloaded _ Event _ Count), a remaining Life Percentage (remaining Life Percentage), an Available Reserved Space (Available Reserved Space), an Available or remaining Available Reserved Block Percentage (Reserved Block Count), a Program failure Total number (Program failure Count), a remaining Percentage of allowed deletion failure (Erase failure Count), a Wear Leveling Count (Wear Leveling Count), a Total number of LBAs write Count (Total LBAs write), an Uncorrectable Sector Count (Uncorrectable Sector Count Line), and the like.
The non-accumulative data may be data having randomness or burstiness, that is, the non-accumulative data may be data related to only the current state of the device, such as: when the device is a disk, the non-accumulation data may be the number of times read requests per second are merged to the electronic device (rrqm/s), the number of times write requests per second are merged to the device (wrqm/s), the number of times read is completed per second (r/s), the number of times write is completed per second (w/s), the amount of data read per second (rkB/s, kB units), the amount of data written per second (wkB/s, kB units), the average amount of data per I/O operation (avgrq-sz, sector number units), the average I/O request queue length waiting to be processed (avgqu-sz), the average I/O request waiting time per time (await, including both waiting time and processing time, millisecond units), the time ratio of I/O queue being non-empty (% util), and the like.
It should be noted that the acquisition periods corresponding to the accumulation type data and the non-accumulation type data may be different, and since the non-accumulation type data is more random and bursty than the accumulation type data, the acquisition period corresponding to the non-accumulation type data may be shorter than the acquisition period corresponding to the accumulation type data. For example, the number of times (rrqm/s) that read requests are combined per second for the electronic device is non-accumulation type data, the collection period corresponding to rrqm/s may be set to 30s, the remaining Life Percentage (remaining Life Percentage) is accumulation type data, and the collection period corresponding to the remaining Life Percentage may be set to 24 h. By setting different acquisition cycles for different types of data, the data can be analyzed in time so as to find out whether the state of the hard disk changes in time.
Furthermore, it should be noted that what type of data the operation data to be collected belongs to is preset, and the device may determine the type of the collected data based on the task currently being executed.
As another way, the similarity of the data to be processed and the historically acquired data acquired during the historically acquired period may be calculated based on the dimensionality of the data to be processed.
In the embodiment of the application, the data to be processed can be further divided into single-dimensional data and multi-dimensional data according to the number of dimensions. The single-dimensional data is data including an identifier, such as: the single dimensional data may be the number of reads completed per second (r/s); the single-dimensional data may also be the number of writes completed per second (w/s), etc. Multidimensional data is data that contains two or more identities, such as: multidimensional data may include Percentage of remaining Life (remaining Life Percentage), Wear Leveling Count (Wear Leveling Count), Total LBA write Count (Total LBAs Written), which may be used to calculate disk Life.
It should be noted that the device may determine in advance whether the data to be acquired is single-dimensional data or multidimensional data based on different tasks, and the single-dimensional data may be transmitted in the form of a single numerical value in the device, and the multidimensional data may be transmitted in the form of an array in the device, so that the device may determine whether the acquired data to be processed (acquired data) is single-dimensional data or multidimensional data according to the format of the data and the task currently executed.
Furthermore, it should be noted that the classification of the type and the dimension of the data to be processed may coincide, that is: when the data to be processed is accumulation type/non-accumulation type data, the data to be processed can be further divided into single-dimensional accumulation/non-accumulation type data and multi-dimensional accumulation/non-accumulation type data; when the data to be processed is single/multidimensional data, the data to be processed can be further divided into single/multidimensional accumulative data and single/multidimensional non-accumulative data.
S130: and if the similarity meets the condition of a specified threshold, storing the data to be processed.
In the embodiment of the present application, the specified threshold condition may be a condition characterizing high-value data, and therefore, if the similarity satisfies the specified threshold condition, it may be determined that the data to be processed is the high-value data. In the embodiment of the present application, the high-value data may be data that characterizes a possible failure of the device, such as: when the data to be processed is the read times per second, the values corresponding to the multiple read times per second acquired in the history of the disk are all larger, and the value of the data to be processed is 0, which indicates that the data reading fault may occur in the disk.
By one approach, specifying the threshold condition may include: the similarity is less than a first similarity threshold. In this way, when the similarity corresponding to the data to be processed is smaller than the first similarity threshold, it may be indicated that the data to be processed is singular data and may contain information related to the device state, which has a high value, that is, the difference between the data to be processed and the historically collected data is large, at this time, the device state may be suddenly changed (for example, the I/O read-write speed of the disk is fast to slow, the disk suddenly has a card slow failure, etc.), and the data to be processed may be persistently stored, that is, stored in the database.
Further, in the embodiment of the present application, as another way, specifying the threshold condition includes: the absolute value of the difference between the similarity and the similarity corresponding to the previous acquisition cycle is smaller than a second similarity threshold. For example, assuming that the similarity corresponding to the data to be processed is a, and the similarity corresponding to the last acquisition cycle which is the same as the data identification of the data to be processed is B, if | a-B | < a second similarity threshold, it indicates that the data to be processed may be singular data, and the data to be processed may be persistently stored, that is, stored in the database.
Both of the above two specified threshold conditions can be used as the basis for storing the data to be processed, and the first condition (the similarity is smaller than the first similarity threshold) is simple to calculate; the second condition (the absolute value of the difference between the similarity and the similarity corresponding to the previous acquisition cycle is smaller than the second similarity threshold) is more sensitive to the similarity change situation of two adjacent acquisition cycles, and the sudden change of the equipment state can be found more quickly. By the method, when the problem caused by the equipment failure represented by the data to be processed is serious (for example, the equipment is directly stopped), the specified threshold condition corresponding to the data to be processed can be set as: the absolute value of the difference value between the similarity of the data to be processed and the similarity corresponding to the previous acquisition cycle is smaller than a second similarity threshold value, so that equipment faults can be found in time; when the problem caused by the failure of the device represented by the data to be processed is light (for example, a certain function of the device fails, but other tasks can be executed), the specified threshold condition corresponding to the data to be processed can be set as: the similarity of the data to be processed is smaller than a first similarity threshold value, so that the running state of the equipment can be analyzed quickly. Therefore, the specified threshold condition can be determined based on the actual requirement, and the flexibility of the similarity judgment method is improved.
It should be noted that the first similarity threshold and the second similarity threshold may be determined according to the type of the data to be processed, experience, and the like. For example: because the non-accumulation data has randomness and mutation, and the accumulation data is more stable than the non-accumulation data (i.e. the value of the accumulation data changes slowly), the similarity threshold corresponding to the accumulation data can be smaller than the similarity threshold corresponding to the non-accumulation data. For another example: since the second similarity threshold is compared with the absolute value of the similarity difference of the adjacent cycles, and the change of the similarity of the adjacent cycles is small, the second similarity threshold may be smaller than the first similarity threshold.
In the data acquisition method provided by this embodiment, after acquiring, in response to an acquisition instruction, acquisition data acquired in a current acquisition period, the acquisition data acquired in the current acquisition period is used as data to be processed, similarity between the data to be processed and historical acquisition data acquired in a historical acquisition period is calculated, and if the similarity meets a specified threshold condition, the data to be processed is stored. By the method, the similarity between the data to be processed obtained through calculation and the historical acquisition data acquired in the historical acquisition period can be compared with the specified threshold condition, and the data to be processed is stored under the condition that the similarity meets the specified threshold condition, so that the acquired acquisition data can be screened to a certain extent according to the specified threshold condition to obtain the acquisition data meeting the requirement (the specified threshold condition) for storage, the acquired acquisition data acquired each time does not need to be stored directly, and the storage space is saved.
Referring to fig. 3, a data acquisition method provided in the present application includes:
s210: and responding to the acquisition instruction, and acquiring the acquired data acquired in the current acquisition period as the data to be processed.
S220: and if the data to be processed is accumulation type data, calculating similarity based on the data to be processed and the change rate of the historical acquisition data acquired in the historical acquisition period.
As one mode, as shown in fig. 4, if the data to be processed is accumulation-type data, calculating the similarity based on the data to be processed and the change rate of the historical acquisition data acquired in the historical acquisition period includes:
s2201: and if the data to be processed is single-dimensional data, acquiring historical acquisition data acquired by a plurality of historical acquisition cycles respectively to obtain a plurality of historical acquisition data.
The quantity of the acquired historical acquired data can be determined according to the task of analyzing the state of the device, the acquisition period and other factors, and generally, 7, 14, 20 or 30 historical acquired data can be acquired according to the actual situation. Illustratively, if the task of analyzing the state of the device is to check the percentage of the remaining life of the disk, and the acquisition cycle is 24 hours, 30 pieces of historical acquisition data can be acquired; if the task of analyzing the equipment state is to check the non-empty time ratio of the disk I/O queue, and the acquisition period is 30s, 20 pieces of historical acquisition data can be acquired.
S2202: and acquiring a plurality of pairs of data from the data to be processed and the plurality of historical acquisition data, wherein the acquisition periods corresponding to the data in each pair of data are adjacent.
In an exemplary embodiment, it is assumed that the acquired historical acquisition data sequentially include, from first to last: A. b, C, D, E, F, G, if the data to be processed is H, there may be the following data pairs: AB. BC, CD, DE, EF, FG, GH, wherein the acquisition periods of each pair of data are adjacent.
S2203: and acquiring the difference between the data after the corresponding acquisition period and the data before the corresponding acquisition period in each pair of data as a reference difference value to obtain the reference difference value corresponding to each pair of data.
Wherein, for example, assume that the data pairs are: AB. BC, CD, DE, EF, FG, GH, the reference difference value corresponding to each pair of data can be obtained through B-A, C-B, D-C, E-D, F-E, G-F, H-G: x1(corresponding to B-A), X2(corresponding to C-B), X3(corresponding to D-C), X4(corresponding to E-D), X5(corresponding to F-E), X6(corresponding to G-F), X7(corresponding to H-G).
S2204: and comparing the reference difference value corresponding to each pair of data with the data in the pair of data corresponding to the acquisition period to obtain the change rate corresponding to each pair of data.
Wherein, for example, assume that the data pairs are: AB. BC, CD, DE, EF, FG, GH, and the reference difference value corresponding to each pair of data is: x1、X2、X3、X4、X5、X6、X7Then can pass X1/A、X2/B、X3/C、X4/D、X5/E、X6/F、X7the/G obtains the corresponding change rate of each pair of data: y is1(corresponds to X)1/A)、Y2(corresponds to X)2/B)、Y3(corresponds to X)3/C)、Y4(corresponds to X)4/D)、Y5(corresponds to X)5/E)、Y6(corresponds to X)6/F)、Y7(corresponds to X)7/G)。
S2205: and calculating the similarity based on a first similarity algorithm and the change rate corresponding to each pair of data, wherein the first similarity algorithm comprises any one of standard deviation, Euclidean distance, cosine distance, Pearson correlation coefficient, modified cosine distance, Hamming distance and Manhattan distance.
For example, when calculating the similarity by using the standard deviation in the first similarity algorithm, the change rate corresponding to each pair of data is assumed to be: y is1、Y2、Y3、Y4、Y5、Y6、Y7Then can pass through (Y) first1+Y2+...+Y7) And 7, obtaining an average value M of the change rate, and then obtaining the average value M of the change rate through a standard deviation formula: sqrt (((Y)1-M)^2+(Y2-M)^2+...+(Y7-M) ^2)/7) can obtain that the standard deviation corresponding to the data to be processed is N, and since a smaller value of N indicates that the data to be processed is more similar to the historical data and a larger value of N indicates that the data to be processed is less similar to the historical data, the similarity of the data to be processed can be 1/(N +1) in order not to contradict a specified threshold condition (a smaller similarity indicates that the data to be processed is less similar to the historical data). Similarly, other first similarity algorithms (e.g., Euclidean distance, etc.) may be similarly processed when they contradict the specified threshold condition.
For example, when the similarity is calculated by using the cosine distance in the first similarity algorithm, it is assumed that the change rate corresponding to each pair of data is: y is1、Y2、Y3、Y4、Y5、Y6、Y7Then Y can be substituted6And Y7Substituting into a cosine distance formula to calculate the corresponding similarity of the data to be processed; y may also be independently substituted1And Y2、Y2And Y3、Y3And Y4、Y4And Y5、Y5And Y6、Y6And Y7And substituting the cosine distances into a cosine distance formula to calculate to obtain a plurality of cosine distances, and then performing standard deviation operation on the plurality of cosine distances to obtain the similarity corresponding to the data to be processed.
As another mode, as shown in fig. 5, if the data to be processed is accumulation-type data, calculating the similarity based on the data to be processed and the change rate of the historical acquisition data acquired in the historical acquisition period includes:
s2211: and if the data to be processed is multidimensional data, acquiring historical acquisition data acquired by a plurality of historical acquisition cycles respectively to obtain a plurality of historical acquisition data.
S2212: and acquiring the data of each dimension position in the data to be processed and the plurality of historical acquisition data.
For example, as shown in fig. 6, the acquired historical acquisition data sequentially include, from first to last in time order: { A1, B1, C1, D1}, { A2, B2, C2, D2}, { A3, B3, C3, D3}, where the data to be processed is { A4, B4, C4, D4}, where A, B, C, D respectively represents four dimensional positions of the multidimensional data, e.g., the data of the A dimensional position of the data to be processed is A4, the data of the B dimensional position is B4, the data of the C dimensional position is C4, and the data of the D dimensional position is D4.
S2213: and dividing the data of each dimension position into a plurality of groups based on the corresponding dimension position to obtain a plurality of groups of data, wherein the dimension positions corresponding to the same group of data are the same.
For example, as shown in fig. 6, the acquired historical acquisition data sequentially include, from first to last in time order: { A1, B1, C1, D1}, { A2, B2, C2, D2}, { A3, B3, C3, D3}, where the data to be processed is { A4, B4, C4, D4}, and the multiple sets of data can be obtained as follows: { a1, a2, A3, a4}, { B1, B2, B3, B4}, { C1, C2, C3, C4}, and { D1, D2, D3, D4 }.
S2214: and acquiring a plurality of pairs of data in each group of data, wherein the acquisition periods corresponding to the data in each pair of data are adjacent.
For example, as shown in fig. 6, the data of { a1, a2, A3, a4} may obtain the following three pairs of data: a1A2, A2A3, A3a 4.
S2215: and acquiring the difference between the data after the corresponding acquisition period and the data before the corresponding acquisition period in each pair of data as a reference difference value to obtain the reference difference value corresponding to each pair of data in each group of data.
For example, as shown in fig. 6, in the group of data { a1, a2, A3, a4}, there is a data pair: A1A2, A2A3, A3A4, the reference differences that can be obtained by A2-A1, A3-A2, A4-A3 are: x1 (corresponding to A2-A1), X2 (corresponding to A3-A2), X3 (corresponding to A4-A3).
S2216: and comparing the reference difference value corresponding to each pair of data with the data in the pair of data corresponding to the acquisition period to obtain the change rate corresponding to each pair of data.
For example, as shown in fig. 6, in the group of data { A1, A2, A3, a4}, the reference difference values corresponding to the data pairs A1A2, A2A3, A3a4 are: x1, X2 and X3 can obtain the change rate corresponding to each pair of data through X1/A1, X2/A2 and X3/A3Comprises the following steps: y isA1(corresponding to X1/A1), YA2(corresponding to X2/A2), YA3(corresponding to X3/A3).
S2217: and dividing the plurality of pairs of data into a plurality of sets based on the sampling period of the included data, wherein the corresponding acquisition periods of each pair of data in the same set are the same.
For example, as shown in fig. 6, the group a data pair is: a1A2, A2A3, A3a4, group B data pairs: B1B2, B2B3, B3B4, group C data pairs are: C1C2, C2C3, C3C4, group D data pairs are: D1D2, D2D3, D3D4, which can be divided into the following sets: { A1A2, B1B2, C1C2, D1D2}, { A2A3, B2B3, C2C3, D2D3}, { A3A4, B3B4, C3C4, D3D4}, wherein each pair of data in each set comprises the same two sample periods.
S2218: and generating corresponding multi-dimensional data based on the change rate corresponding to each data in each set to obtain a plurality of multi-dimensional data, wherein the dimension position of the change rate corresponding to each pair of data in the correspondingly generated multi-dimensional data is the same as the dimension position of the data in the pair of data in the data to be processed or the historical acquisition data.
For example, as shown in fig. 6, the change rate of each pair of data in the set { A2a3, B2B3, C2C3, D2D3} is: y isA2、YB2、YC2、YD2The above-described rates of change correspond to the dimensional positions A, B, C, D, respectively.
S2219: and calculating similarity based on a second similarity algorithm and the multi-dimensional data, wherein the second similarity algorithm comprises any one item of Euclidean distance, cosine distance, Pearson correlation coefficient, modified cosine distance, Hamming distance and Manhattan distance.
Illustratively, as shown in fig. 5, the multidimensional data generated based on the change rate of each pair of data in the set { A2a3, B2B3, C2C3, D2D3} is: y isA2、YB2、YC2、YD2The multidimensional data generated based on the rate of change of each pair of data in the set { A3A4, B3B4, C3C4, D3D4} is, in order: y isA3、YB3、YC3、YD3The two multidimensional data can be regarded as two vectors, and the similarity corresponding to the data to be processed can be obtained through a redundant chord distance formula.
Optionally, if multiple pieces of multidimensional change rate data can be generated based on the data to be processed and the obtained multidimensional historical acquisition data, for example: y isA1、YB1、YC1、YD1,YA2、YB2、YC2、YD2,YA3、YB3、YC3、YD3And the like, a plurality of cosine distances can be obtained based on two adjacent multidimensional change rate data, the plurality of distances are subjected to standard deviation, and the similarity corresponding to the data to be processed is obtained by using 1/(the value of the standard deviation + 1).
S230: and if the data to be processed is non-accumulative data, calculating similarity based on the data to be processed and historical acquisition data acquired in a historical acquisition period.
As one mode, as shown in fig. 7, if the data to be processed is non-accumulation data, calculating the similarity based on the data to be processed and the historical acquisition data acquired in the historical acquisition period includes:
s231: and if the data to be processed is single-dimensional data, acquiring historical acquisition data acquired by a plurality of historical acquisition cycles respectively to obtain a plurality of historical acquisition data.
S232: calculating similarity based on a first similarity algorithm, the data to be processed and the historical collected data, wherein the first similarity algorithm comprises any one of standard deviation, Euclidean distance, cosine distance, Pearson correlation coefficient, modified cosine distance, Hamming distance and Manhattan distance.
In an exemplary case, when the similarity is calculated by using the standard deviation in the first similarity algorithm, it is assumed that the acquired historical collected data sequentially include, from first to last in time order: A. b, C, D, E, F, G, if the data to be processed is H, the average value M can be obtained by (a + B +. + H)/8, and then the standard deviation formula is used: the standard deviation corresponding to the data to be processed is N obtained by sqrt (((A-M) ^2+ (B-M) ^2+. 2. + (H-M) ^2)/8), and the similarity of the data to be processed can be 1/(N +1) in order to avoid contradiction with a specified threshold condition, because the smaller the value of N is, the more similar the data to be processed and the historical data is, and the larger the value of N is, the more dissimilar the data to be processed and the historical data is. Similarly, other first similarity algorithms (e.g., Euclidean distance, etc.) may be similarly processed when they contradict the specified threshold condition.
As another mode, as shown in fig. 8, if the data to be processed is non-accumulation type data, calculating similarity based on the data to be processed and historical acquisition data acquired in a historical acquisition period includes:
s236: and if the data to be processed is multidimensional data, acquiring historical acquisition data acquired by a plurality of historical acquisition cycles respectively to obtain a plurality of historical acquisition data.
S237: and calculating similarity based on a second similarity algorithm, the data to be processed and the plurality of historical acquisition data, wherein the second similarity algorithm comprises any one of Euclidean distance, cosine distance, Pearson correlation coefficient, modified cosine distance, Hamming distance and Manhattan distance.
The obtained historical collected data are, illustratively, sequentially from first to last in time sequence: { A1, B1, C1, D1}, { A2, B2, C2, D2}, { A3, B3, C3 and D3}, wherein the data to be processed is { A4, B4, C4 and D4}, and when the second similarity algorithm is a cosine distance, the similarity of the data to be processed can be obtained by calculating the cosine distances between { A4, B4, C4 and D4} and { A3, B3, C3 and D3 }.
Optionally, if a plurality of cosine distances can be obtained based on the data to be processed and the obtained multi-dimensional historical acquisition data, the plurality of cosine distances can be used as standard deviations, and then 1/(value of standard deviation +1) is used to obtain the corresponding similarity of the data to be processed.
It should be noted that, when the larger the value obtained by the second similarity algorithm is, the less similar the data to be processed is to the historical collected data, the corresponding similarity of the data to be processed may be obtained by 1/(the value +1 of the second similarity algorithm), so as to compare the similarity with the specified threshold condition.
S240: and if the similarity meets the condition of a specified threshold, storing the data to be processed.
According to the data acquisition method provided by the embodiment, the similarity between the data to be processed obtained through calculation and the historical acquisition data acquired in the historical acquisition period can be compared with the specified threshold condition, and the data to be processed is stored under the condition that the similarity meets the specified threshold condition, so that the acquired acquisition data can be screened to a certain extent according to the specified threshold condition to obtain the acquisition data meeting the requirement (the specified threshold condition) for storage, further, the acquired acquisition data do not need to be directly stored every time, and the storage space is saved. Also, in the present embodiment, when the data to be processed is accumulation type data, the similarity may be calculated based on the data to be processed and the rate of change of the plurality of historically collected data; when the data to be processed is non-accumulative data, the similarity can be calculated based on the data to be processed and the plurality of historical collected data, and by the method, because the data characteristics of different data types are different, the similarity calculation is performed by adopting different data based on the difference of the data types to be processed, so that whether the data to be processed has the data characteristics with higher value or not can be better identified. And the accumulated/non-accumulated data is further divided into single-dimensional data and multi-dimensional data, and similarity calculation is performed by adopting different similarity algorithms according to different data dimensions, so that the applicability and the expansibility of the data acquisition method provided by the application are improved.
Referring to fig. 9, a data acquisition method provided in the present application includes:
s310: and responding to the acquisition instruction, and acquiring the acquired data acquired in the current acquisition period as the data to be processed.
S320: and calculating the similarity between the data to be processed and historical acquisition data acquired in a historical acquisition period.
S330: and if the similarity meets the condition of a specified threshold, storing the data to be processed.
S340: and if the similarity does not meet the specified threshold condition and the acquisition time of the data to be processed is matched with a persistent storage period, storing the data to be processed.
In the embodiment of the present application, for the subsequent maintenance of the device and the consideration of the security of the device, a persistent storage period may be set to periodically store the data to be processed, so that more regular data (data in normal operation of the device) may be stored.
As a mode, the persistent storage period may be N times (N >1, and N is an integer) of the acquisition period, when the similarity corresponding to the to-be-processed data does not satisfy the specified threshold condition, it may be determined whether the acquisition time of the to-be-processed data matches the persistent storage period, and if the acquisition time of the to-be-processed data matches the persistent storage period, that is, the acquisition time of the to-be-processed data is just the time corresponding to the persistent storage period, the to-be-processed data may be persistently stored.
It should be noted that, in the embodiment of the present application, it may also be determined whether the acquisition time of the to-be-processed data matches the persistent storage period, and if the acquisition time of the to-be-processed data matches the persistent storage period, the to-be-processed data may be persistently stored; if the acquisition time of the data to be processed is not matched with the persistence storage period, whether the similarity corresponding to the data to be processed meets a specified threshold condition or not can be judged, and if the similarity meets the specified threshold condition, the data to be processed can be persistently stored.
Optionally, for convenience of management, the same persistent storage period may be set for all the to-be-processed data, and for example, the persistent storage period of all the to-be-processed data may be set to 1 day.
Optionally, in order to save the storage space, a persistent storage period may be determined based on the type of the data to be processed, and for example, when the data to be processed is accumulation-type data, the persistent storage period may be set to 7 days; when the data to be processed is non-accumulation type data, the persistent storage period may be set to 1 day.
According to the data acquisition method provided by the embodiment, the similarity between the data to be processed obtained through calculation and the historical acquisition data acquired in the historical acquisition period can be compared with the specified threshold condition, and the data to be processed is stored under the condition that the similarity meets the specified threshold condition, so that the acquired acquisition data can be screened to a certain extent according to the specified threshold condition to obtain the acquisition data meeting the requirement (the specified threshold condition) for storage, further, the acquired acquisition data do not need to be directly stored every time, and the storage space is saved. In addition, in this embodiment, the data to be processed may be stored when the acquisition time is matched with the persistent storage period or the similarity may meet a specified threshold condition, and since the data to be processed is stored when the persistent storage period arrives, and whether the similarity meets the specified threshold condition is random, the data to be processed may be subjected to variable-frequency persistent storage, so that both the data to be processed having a higher value and the general data to be processed may be persistently stored, and the security and stability of the device may be improved while the storage space is saved.
Referring to fig. 10, a data acquisition method provided in the present application includes:
s410: and responding to the acquisition instruction, and acquiring the acquired data acquired in the current acquisition period as the data to be processed.
S420: and calculating the similarity between the data to be processed and historical acquisition data acquired in a historical acquisition period.
S430: and if the similarity meets the condition of a specified threshold, storing the data to be processed.
S440: data is collected and stored based on a persistence storage period, wherein the persistence storage period is greater than the sampling period.
As a mode, as shown in fig. 11, when a time corresponding to the persistent storage period arrives, the control unit of the electronic device may send an acquisition instruction to the data acquisition unit, where the acquisition instruction may include an identifier of data to be acquired, and after the data acquisition unit responds to the acquisition instruction, the data acquisition unit may perform data acquisition operation according to the identifier of the data to be acquired, and perform persistent storage on the acquired data acquired in the current persistent storage period.
Optionally, for convenience of management, the same persistent storage period may be set for all the to-be-processed data, and for example, the persistent storage period of all the to-be-processed data may be set to 1 day.
Optionally, in order to save the storage space, a persistent storage period may be determined based on the type of the data to be processed, and for example, when the data to be processed is accumulation-type data, the persistent storage period may be set to 7 days; when the data to be processed is non-accumulation type data, the persistent storage period may be set to 1 day.
As another mode, as shown in fig. 11, when a time corresponding to a sampling period arrives, a control unit of the electronic device may send an acquisition instruction to the data acquisition unit, where the acquisition instruction may include an identifier of data to be acquired, and after the data acquisition unit responds to the acquisition instruction, the data acquisition unit may perform data acquisition operation according to the identifier of the data to be acquired, so as to use the acquired data acquired in the current acquisition period as data to be processed; and then comparing the similarity corresponding to the data to be processed with a specified threshold condition, and if the similarity corresponding to the data to be processed meets the specified threshold condition, performing persistent storage on the data to be processed.
According to the data acquisition method provided by the embodiment, the similarity between the data to be processed obtained through calculation and the historical acquisition data acquired in the historical acquisition period can be compared with the specified threshold condition, and the data to be processed is stored under the condition that the similarity meets the specified threshold condition, so that the acquired acquisition data can be screened to a certain extent according to the specified threshold condition to obtain the acquisition data meeting the requirement (the specified threshold condition) for storage, further, the acquired acquisition data do not need to be directly stored every time, and the storage space is saved. . In addition, in this embodiment, the data to be processed may be stored when a time corresponding to the persistent storage period arrives or the similarity may satisfy the specified threshold condition, and since the data to be processed is stored when the persistent storage period arrives, and whether the similarity satisfies the specified threshold condition is random, the data to be processed may be subjected to variable-frequency persistent storage, so that both the data to be processed having a higher value and the normal data to be processed may be persistently stored, and the security and stability of the device may be improved while the storage space is saved.
Referring to fig. 12, the present application provides a data acquisition apparatus 600, where the apparatus 600 includes:
the to-be-processed data acquiring unit 610 is configured to, in response to the acquisition instruction, acquire acquired data acquired in the current acquisition cycle as to-be-processed data.
A similarity calculating unit 620, configured to calculate a similarity between the data to be processed and the historical acquisition data acquired in the historical acquisition period.
The storage unit 630 is configured to store the to-be-processed data if the similarity satisfies a specified threshold condition.
As a manner, the similarity calculating unit 620 is specifically configured to calculate the similarity between the data to be processed and the historical acquisition data acquired in the historical acquisition period based on the type of the data to be processed.
As another mode, the similarity calculating unit 620 is specifically configured to calculate the similarity based on the to-be-processed data and the change rate of the historical acquisition data acquired in the historical acquisition period if the to-be-processed data is accumulation-type data; and if the data to be processed is non-accumulative data, calculating similarity based on the data to be processed and historical acquisition data acquired in a historical acquisition period.
Optionally, the similarity calculation unit 620 is specifically configured to, if the data to be processed is single-dimensional data, obtain historical acquisition data acquired in each of a plurality of historical acquisition cycles to obtain a plurality of historical acquisition data; acquiring a plurality of pairs of data from the data to be processed and the plurality of historical acquisition data, wherein the acquisition periods corresponding to the data in each pair of data are adjacent; acquiring the difference between the data after the corresponding acquisition period and the data before the corresponding acquisition period in each pair of data as a reference difference value to obtain the reference difference value corresponding to each pair of data; comparing the reference difference value corresponding to each pair of data with the data after the corresponding acquisition period in each pair of data to obtain the change rate corresponding to each pair of data; and calculating the similarity based on a first similarity algorithm and the change rate corresponding to each pair of data, wherein the first similarity algorithm comprises any one of standard deviation, Euclidean distance, cosine distance, Pearson correlation coefficient, modified cosine distance, Hamming distance and Manhattan distance.
Optionally, the similarity calculating unit 620 is specifically configured to, if the data to be processed is multidimensional data, obtain historical acquisition data acquired in each of a plurality of historical acquisition cycles to obtain a plurality of historical acquisition data; acquiring the data to be processed and the data of each dimension position in the plurality of historical acquisition data; dividing the data of each dimension position into a plurality of groups based on the corresponding dimension position to obtain a plurality of groups of data, wherein the corresponding dimension positions of the same group of data are the same; acquiring a plurality of pairs of data in each group of data, wherein the corresponding acquisition periods of the data in each pair of data are adjacent; acquiring the difference between the data after the corresponding acquisition period and the data before the corresponding acquisition period in each pair of data as a reference difference value to obtain the reference difference value corresponding to each pair of data in each group of data; comparing the reference difference value corresponding to each pair of data with the corresponding acquisition period in each pair of data to obtain the corresponding change rate of each pair of data; dividing a plurality of pairs of data into a plurality of sets based on the sampling period of the included data, wherein the corresponding acquisition periods of each team of data in the same set are the same; generating corresponding multidimensional data based on the change rate corresponding to each data in each set to obtain a plurality of multidimensional data, wherein the dimension position of the change rate corresponding to each pair of data in the correspondingly generated multidimensional data is the same as the dimension position of the data in the pair of data in the data to be processed or the historical acquisition data; and calculating similarity based on a second similarity algorithm and the multi-dimensional data, wherein the second similarity algorithm comprises any one item of Euclidean distance, cosine distance, Pearson correlation coefficient, modified cosine distance, Hamming distance and Manhattan distance.
Optionally, the similarity calculating unit 620 is specifically configured to, if the data to be processed is single-dimensional data, obtain historical acquisition data acquired in each of a plurality of historical acquisition cycles to obtain a plurality of historical acquisition data; calculating similarity based on a first similarity algorithm, the data to be processed and the historical collected data, wherein the first similarity algorithm comprises any one of standard deviation, Euclidean distance, cosine distance, Pearson correlation coefficient, modified cosine distance, Hamming distance and Manhattan distance.
Optionally, the similarity calculating unit 620 is specifically configured to, if the data to be processed is multidimensional data, obtain historical acquisition data acquired in each of a plurality of historical acquisition cycles to obtain a plurality of historical acquisition data; and calculating similarity based on a second similarity algorithm, the data to be processed and the plurality of historical acquisition data, wherein the second similarity algorithm comprises any one of Euclidean distance, cosine distance, Pearson correlation coefficient, modified cosine distance, Hamming distance and Manhattan distance.
As another way, the similarity calculation unit 620 is specifically configured to calculate the similarity between the data to be processed and the historical acquisition data acquired in the historical acquisition period based on the dimension of the data to be processed.
As one mode, the storage unit 630 is specifically configured to store the to-be-processed data if the similarity does not satisfy the specified threshold condition and the acquisition time of the to-be-processed data matches a persistent storage period.
As another way, the storage unit 630 is specifically configured to collect and store data based on a persistent storage period, where the persistent storage period is greater than the sampling period.
Optionally, the persistent storage period is determined based on the type of the data to be processed.
Optionally, the specified threshold condition includes that the similarity is smaller than a first similarity threshold, or that a difference between the similarity and a similarity corresponding to a previous acquisition cycle is smaller than a second similarity threshold.
An electronic device provided by the present application will be described below with reference to fig. 13.
Referring to fig. 13, based on the data acquisition method and apparatus, another electronic device 1000 capable of executing the data acquisition method is further provided in the embodiment of the present application. The electronic device 1000 includes one or more processors 102, memory 104 (only one shown) coupled to each other. The memory 104 stores programs that can execute the content of the foregoing embodiments, and the processor 102 can execute the programs stored in the memory 104. Processor 102 may include one or more processing cores, among other things. The processor 102 interfaces with various components throughout the electronic device 1000 using various interfaces and circuitry to perform various functions of the electronic device 1000 and process data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 104 and invoking data stored in the memory 104.
Alternatively, the processor 102 may be implemented in hardware using at least one of Digital Signal Processing (DSP), Field-Programmable Gate Array (FPGA), and Programmable Logic Array (PLA). The processor 102 may integrate one or more of a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a modem, and the like. Wherein, the CPU mainly processes an operating system, a user interface, an application program and the like; the modem is used to handle wireless communications. It is understood that the modem may not be integrated into the processor 102, but may be implemented by a communication chip.
The Memory 104 may include a Random Access Memory (RAM) or a Read-Only Memory (Read-Only Memory). The memory 104 may be used to store instructions, programs, code sets, or instruction sets. The memory 104 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for implementing at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing various method embodiments described below, and the like. The data storage area may also store data created by the electronic device 1000 during use (e.g., phone book, audio-video data, chat log data), and the like.
Referring to fig. 14, a block diagram of a computer-readable storage medium according to an embodiment of the present application is shown. The computer-readable storage medium 800 has stored therein program code that can be called by a processor to execute the methods described in the above-described method embodiments.
The computer-readable storage medium 800 may be an electronic memory such as a flash memory, an EEPROM (electrically erasable programmable read only memory), an EPROM, a hard disk, or a ROM. Alternatively, the computer-readable storage medium 800 includes a non-volatile computer-readable storage medium. The computer readable storage medium 800 has storage space for program code 810 to perform any of the method steps of the method described above. The program code can be read from or written to one or more computer program products. The program code 800 may be compressed, for example, in a suitable form.
In summary, the data acquisition method, the data acquisition device, the electronic device and the storage medium provided by the application respond to the acquisition instruction, acquire the acquired data acquired in the current acquisition period, use the acquired data acquired in the current acquisition period as the data to be processed, calculate the similarity between the data to be processed and the historical acquired data acquired in the historical acquisition period, and store the data to be processed if the similarity meets the specified threshold condition. Therefore, by the method, the similarity between the data to be processed obtained through calculation and the historical acquisition data acquired in the historical acquisition period can be compared with the specified threshold condition, and the data to be processed is stored under the condition that the similarity meets the specified threshold condition, so that the acquired acquisition data can be screened to a certain extent according to the specified threshold condition to obtain the acquisition data meeting the requirement (the specified threshold condition) for storage, the acquired acquisition data obtained each time do not need to be directly stored, and the storage space is saved.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not necessarily depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.
Claims (15)
1. A method of data acquisition, the method comprising:
acquiring acquired data acquired in the current acquisition period as data to be processed in response to the acquisition instruction;
calculating the similarity between the data to be processed and historical acquisition data acquired in a historical acquisition period;
and if the similarity meets the condition of a specified threshold, storing the data to be processed.
2. The method of claim 1, wherein said calculating similarity of said collected data to historical collected data collected during a historical collection cycle comprises:
and calculating the similarity between the data to be processed and historical acquisition data acquired in a historical acquisition period based on the type of the data to be processed.
3. The method of claim 2, wherein the calculating the similarity of the data to be processed to historically acquired data acquired during a historical acquisition period based on the type of the data to be processed comprises:
if the data to be processed is accumulation type data, calculating similarity based on the data to be processed and the change rate of historical acquisition data acquired in a historical acquisition period;
and if the data to be processed is non-accumulative data, calculating similarity based on the data to be processed and historical acquisition data acquired in a historical acquisition period.
4. The method of claim 3, wherein calculating the similarity based on the pending data and a rate of change of historical acquisition data acquired over a historical acquisition period comprises:
if the data to be processed is single-dimensional data, acquiring historical acquisition data acquired by a plurality of historical acquisition cycles respectively to obtain a plurality of historical acquisition data;
acquiring a plurality of pairs of data from the data to be processed and the plurality of historical acquisition data, wherein the acquisition periods corresponding to the data in each pair of data are adjacent;
acquiring the difference between the data after the corresponding acquisition period and the data before the corresponding acquisition period in each pair of data as a reference difference value to obtain the reference difference value corresponding to each pair of data;
comparing the reference difference value corresponding to each pair of data with the corresponding acquisition period in each pair of data to obtain the corresponding change rate of each pair of data;
and calculating the similarity based on a first similarity algorithm and the change rate corresponding to each pair of data, wherein the first similarity algorithm comprises any one of standard deviation, Euclidean distance, cosine distance, Pearson correlation coefficient, modified cosine distance, Hamming distance and Manhattan distance.
5. The method of claim 3, wherein calculating the similarity based on the pending data and a rate of change of historical acquisition data acquired over a historical acquisition period comprises:
if the data to be processed is multidimensional data, acquiring historical acquisition data acquired by a plurality of historical acquisition cycles respectively to obtain a plurality of historical acquisition data;
acquiring the data to be processed and the data of each dimension position in the plurality of historical acquisition data;
dividing the data of each dimension position into a plurality of groups based on the corresponding dimension position to obtain a plurality of groups of data, wherein the corresponding dimension positions of the same group of data are the same;
acquiring a plurality of pairs of data in each group of data, wherein the corresponding acquisition periods of the data in each pair of data are adjacent;
acquiring the difference between the data after the corresponding acquisition period and the data before the corresponding acquisition period in each pair of data as a reference difference value to obtain the reference difference value corresponding to each pair of data in each group of data;
comparing the reference difference value corresponding to each pair of data with the corresponding acquisition period in each pair of data to obtain the corresponding change rate of each pair of data;
dividing a plurality of pairs of data into a plurality of sets based on the sampling period of the included data, wherein the corresponding acquisition periods of each pair of data in the same set are the same;
generating corresponding multidimensional data based on the change rate corresponding to each data in each set to obtain a plurality of multidimensional data, wherein the dimension position of the change rate corresponding to each pair of data in the correspondingly generated multidimensional data is the same as the dimension position of the data in the pair of data in the data to be processed or the historical acquisition data;
and calculating similarity based on a second similarity algorithm and the multi-dimensional data, wherein the second similarity algorithm comprises any one item of Euclidean distance, cosine distance, Pearson correlation coefficient, modified cosine distance, Hamming distance and Manhattan distance.
6. The method of claim 3, wherein the calculating similarities based on the data to be processed and historical acquisition data acquired during a historical acquisition period comprises:
if the data to be processed is single-dimensional data, acquiring historical acquisition data acquired by a plurality of historical acquisition cycles respectively to obtain a plurality of historical acquisition data;
calculating similarity based on a first similarity algorithm, the data to be processed and the historical collected data, wherein the first similarity algorithm comprises any one of standard deviation, Euclidean distance, cosine distance, Pearson correlation coefficient, modified cosine distance, Hamming distance and Manhattan distance.
7. The method of claim 3, wherein the calculating similarities based on the data to be processed and historical acquisition data acquired during a historical acquisition period comprises:
if the data to be processed is multidimensional data, acquiring historical acquisition data acquired by a plurality of historical acquisition cycles respectively to obtain a plurality of historical acquisition data;
and calculating similarity based on a second similarity algorithm, the data to be processed and the plurality of historical acquisition data, wherein the second similarity algorithm comprises any one of Euclidean distance, cosine distance, Pearson correlation coefficient, modified cosine distance, Hamming distance and Manhattan distance.
8. The method of claim 1, wherein the calculating the similarity of the data to be processed and the historical acquisition data acquired during the historical acquisition period comprises:
and calculating the similarity between the data to be processed and historical acquisition data acquired in a historical acquisition period based on the dimensionality of the data to be processed.
9. The method according to any one of claims 1-8, further comprising:
and if the similarity does not meet the specified threshold condition and the acquisition time of the data to be processed is matched with a persistent storage period, storing the data to be processed.
10. The method of claim 9, further comprising:
and determining a persistent storage period based on the type of the data to be processed.
11. The method of claim 1, wherein the specified threshold condition comprises the similarity being less than a first similarity threshold, or
And the absolute value of the difference value of the similarity and the similarity corresponding to the previous acquisition cycle is smaller than a second similarity threshold.
12. The method according to any one of claims 1-8, further comprising:
data is collected and stored based on a persistence storage period, wherein the persistence storage period is greater than the sampling period.
13. A data acquisition device, the device comprising:
the to-be-processed data acquisition unit is used for responding to the acquisition instruction and acquiring the acquired data acquired in the current acquisition period as to-be-processed data;
the similarity calculation unit is used for calculating the similarity between the data to be processed and historical acquisition data acquired in a historical acquisition period;
and the storage unit is used for storing the data to be processed if the similarity meets a specified threshold condition.
14. An electronic device comprising one or more processors and memory;
one or more programs stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to perform the method of any of claims 1-12.
15. A computer-readable storage medium, having program code stored therein, wherein the method of any of claims 1-12 is performed when the program code is run.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111498617.7A CN114385463A (en) | 2021-12-09 | 2021-12-09 | Data acquisition method and device and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111498617.7A CN114385463A (en) | 2021-12-09 | 2021-12-09 | Data acquisition method and device and electronic equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114385463A true CN114385463A (en) | 2022-04-22 |
Family
ID=81195268
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111498617.7A Pending CN114385463A (en) | 2021-12-09 | 2021-12-09 | Data acquisition method and device and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114385463A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116320019A (en) * | 2023-05-16 | 2023-06-23 | 荣耀终端有限公司 | Data acquisition method, medium and electronic equipment |
CN116886424A (en) * | 2023-08-15 | 2023-10-13 | 哈尔滨雷风恒科技开发有限公司 | Digital transmission security analysis system and method based on big data of computer |
US11847038B1 (en) * | 2022-07-15 | 2023-12-19 | Vmware, Inc. | System and method for automatically recommending logs for low-cost tier storage |
-
2021
- 2021-12-09 CN CN202111498617.7A patent/CN114385463A/en active Pending
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11847038B1 (en) * | 2022-07-15 | 2023-12-19 | Vmware, Inc. | System and method for automatically recommending logs for low-cost tier storage |
CN116320019A (en) * | 2023-05-16 | 2023-06-23 | 荣耀终端有限公司 | Data acquisition method, medium and electronic equipment |
CN116320019B (en) * | 2023-05-16 | 2023-10-27 | 荣耀终端有限公司 | Data acquisition method, medium and electronic equipment |
CN116886424A (en) * | 2023-08-15 | 2023-10-13 | 哈尔滨雷风恒科技开发有限公司 | Digital transmission security analysis system and method based on big data of computer |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114385463A (en) | Data acquisition method and device and electronic equipment | |
CN110413227B (en) | Method and system for predicting remaining service life of hard disk device on line | |
US9244790B1 (en) | System and method for predicting future disk failures | |
CN107656807B (en) | Automatic elastic expansion method and device for virtual resources | |
CN110096472B (en) | Selection of management nodes in a node cluster | |
US11210183B2 (en) | Memory health tracking for differentiated data recovery configurations | |
CN110147470B (en) | Cross-machine-room data comparison system and method | |
JPWO2008056682A1 (en) | Resource information collection device, resource information collection method, program, and collection schedule generation device | |
CN111464583A (en) | Computing resource allocation method, device, server and storage medium | |
CN110019017B (en) | High-energy physical file storage method based on access characteristics | |
US11936709B2 (en) | Generating key assignment data for message processing | |
CN117971488A (en) | Storage management method and related device for distributed database cluster | |
CN113885803A (en) | Data storage method and device, electronic equipment and storage medium | |
CN113409876A (en) | Method and system for positioning fault hard disk | |
CN115237334A (en) | Hard disk management method and device and electronic equipment | |
CN104102557A (en) | Cloud computing platform data backup method based on clustering | |
CN108769123B (en) | Data system and data processing method | |
CN109150792B (en) | Method and device for improving data storage security | |
CN115993932A (en) | Data processing method, device, storage medium and electronic equipment | |
CN113448747B (en) | Data transmission method, device, computer equipment and storage medium | |
CN115509853A (en) | Cluster data anomaly detection method and electronic equipment | |
CN112817987A (en) | Method, device, equipment and storage medium for accessing distributed storage cluster | |
CN117472267B (en) | Method, system, and non-transitory machine-readable storage medium for data reduction of storage volumes | |
CN111985651A (en) | Operation and maintenance method and device for business system | |
CN118349532B (en) | Filecoin scene adaptation method and system based on additional storage |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |