CN115858633B - Time sequence data analysis method and device based on data lake - Google Patents

Time sequence data analysis method and device based on data lake Download PDF

Info

Publication number
CN115858633B
CN115858633B CN202310166499.2A CN202310166499A CN115858633B CN 115858633 B CN115858633 B CN 115858633B CN 202310166499 A CN202310166499 A CN 202310166499A CN 115858633 B CN115858633 B CN 115858633B
Authority
CN
China
Prior art keywords
data
visual
preset
time sequence
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310166499.2A
Other languages
Chinese (zh)
Other versions
CN115858633A (en
Inventor
李保平
杨建荣
谢超
麦新伟
欧再辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Huitong Guoxin Technology Co ltd
Original Assignee
Guangzhou Huitong Guoxin Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Huitong Guoxin Technology Co ltd filed Critical Guangzhou Huitong Guoxin Technology Co ltd
Priority to CN202310166499.2A priority Critical patent/CN115858633B/en
Publication of CN115858633A publication Critical patent/CN115858633A/en
Application granted granted Critical
Publication of CN115858633B publication Critical patent/CN115858633B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a time sequence data analysis method and device based on a data lake. Comprising the following steps: acquiring a target data set to be analyzed, wherein the target data set is acquired from a data lake; converting the data in the target data set by using a preset function to obtain time sequence data corresponding to the target data set; generating corresponding visual data based on the time sequence data, and analyzing the visual data according to preset analysis conditions to obtain a data rule corresponding to the time sequence data. According to the method provided by the embodiment of the application, the target data set is automatically converted into the time sequence data, the user can directly configure corresponding analysis conditions, and the time sequence data is analyzed by utilizing the analysis conditions, so that the rule of the time sequence data is generalized and proposed, the data analysis according to the user needs is realized, the analysis is not needed by professional analysts, and the difficulty of the data analysis of the common user is reduced.

Description

Time sequence data analysis method and device based on data lake
Technical Field
The application relates to the field of data analysis, in particular to a time sequence data analysis method and device based on a data lake.
Background
With the development of the internet of things and the reduction of storage cost and the growth of talent teams, enterprises are increasingly favored to change an original data platform from a traditional centralized data warehouse architecture to a more open data lake architecture in the digital transformation process, and in the process, great pressure is brought to enterprise data management work by the increasing diversification of data, and a great part of the data come from time sequence type data of internet of things equipment.
At present, analysis based on time sequence data is targeted custom development, for example, real-time graph analysis for stock exchanges only can be used for analyzing the stock exchanges; the custom development process often goes through a complete project development life cycle, which seriously hinders the progress of digital transformation of enterprises. Moreover, the time sequence data analysis generally needs to be carried out by a professional with experience, so that the difficulty of analyzing the time sequence data by a common user is high.
Disclosure of Invention
In order to solve the technical problems or at least partially solve the technical problems, the application provides a time sequence data analysis method and device based on a data lake.
According to an aspect of the embodiment of the present application, there is provided a time series data analysis method based on a data lake, including:
acquiring a target data set to be analyzed, wherein the target data set is acquired from a data lake;
converting the data in the target data set by using a preset function to obtain time sequence data corresponding to the target data set;
generating corresponding visual data based on the time sequence data, and analyzing the visual data according to preset analysis conditions to obtain a data rule corresponding to the time sequence data.
Further, the acquiring the target data set to be analyzed includes:
acquiring metadata information which is input currently, wherein the metadata information comprises at least one piece of data description information;
querying an original data set matched with the data description information from the data lake;
and acquiring data analysis conditions, and constructing the target data set by utilizing the original data meeting the data analysis conditions in the original data set.
Further, the data analysis conditions at least comprise filtering conditions and sampling conditions;
the constructing the target data set by using the original data meeting the data analysis condition in the original data set includes:
determining the original data meeting the filtering condition in the original data set as candidate data;
sampling the candidate data according to granularity indicated by the sampling condition to obtain target data;
the target data set is constructed based on the target data.
Further, the generating the corresponding visualized data based on the time sequence data includes:
acquiring attribute information corresponding to the target data set, and inquiring at least one visualization type corresponding to the attribute information;
obtaining a target visual type matched with a preset visual requirement from at least one visual type;
and generating the visual data according to the target visual type by the time sequence data.
Further, the analyzing the visual data according to the preset analysis condition to obtain the data rule corresponding to the time sequence data includes:
detecting whether the visual data is stable or not, and obtaining a detection result;
and acquiring an analysis strategy corresponding to the detection result, and analyzing the visualized data by utilizing the analysis strategy to obtain the data rule.
Further, the analyzing the visual data by using the analysis strategy to obtain the data rule includes:
detecting whether white noise data exists in the visual data or not under the condition that the detection result is a first result, wherein the first result is used for indicating that the visual data is stable;
detecting a stable type corresponding to the visual data under the condition that white noise data exists in the visual data;
and determining the stable type as the data rule.
Further, the analyzing the visual data by using the analysis strategy to obtain the data rule includes:
under the condition that the detection result is a second result, performing differential operation based on the visual data to obtain an operation result, wherein the second result is used for indicating that the visual data is stable;
detecting the operation result according to preset dimensions to obtain trend information corresponding to each preset dimension, wherein the preset dimensions comprise: linear dimension, curvilinear dimension, and periodic dimension;
and determining trend information corresponding to the preset dimension as the data rule.
According to still another aspect of the embodiment of the present application, there is also provided a time series data analysis device based on a data lake, including:
the acquisition module is used for acquiring a target data set to be analyzed, wherein the target data set is acquired from a data lake;
the conversion module is used for converting the data in the target data set by utilizing a preset function to obtain time sequence data corresponding to the target data set;
the generation module is used for generating corresponding visual data based on the time sequence data, and analyzing the visual data according to preset analysis conditions to obtain a data rule corresponding to the time sequence data.
According to another aspect of the embodiments of the present application, there is also provided a storage medium including a stored program that performs the above steps when running.
According to another aspect of the embodiment of the present application, there is also provided an electronic device including a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory complete communication with each other through the communication bus; wherein: a memory for storing a computer program; and a processor for executing the steps of the method by running a program stored on the memory.
Embodiments of the present application also provide a computer program product comprising instructions which, when run on a computer, cause the computer to perform the steps of the above method.
Compared with the prior art, the technical scheme provided by the embodiment of the application has the following advantages: according to the method provided by the embodiment of the application, the target data set is automatically converted into the time sequence data, the user can directly configure corresponding analysis conditions, and the time sequence data is analyzed by utilizing the analysis conditions, so that the rule of the time sequence data is generalized and proposed, the data analysis according to the user needs is realized, the analysis is not needed by professional analysts, and the difficulty of the data analysis of the common user is reduced.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.
In order to more clearly illustrate the embodiments of the application or the technical solutions of the prior art, the drawings which are used in the description of the embodiments or the prior art will be briefly described, and it will be obvious to a person skilled in the art that other drawings can be obtained from these drawings without inventive effort.
FIG. 1 is a flow chart of a time series data analysis method based on a data lake according to an embodiment of the present application;
FIG. 2 is a schematic diagram of metadata information according to an embodiment of the present application;
FIG. 3 is a schematic diagram of processing time series data according to an embodiment of the present application;
FIG. 4 is a block diagram of a time series data analysis device based on a data lake according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments, illustrative embodiments of the present application and descriptions thereof are used to explain the present application and do not constitute undue limitations of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
It should be noted that in this document, relational terms such as "first" and "second" and the like are used solely to distinguish one entity or action from another similar entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The embodiment of the application provides a time sequence data analysis method and device based on a data lake. The method provided by the embodiment of the application can be applied to any needed electronic equipment, for example, the electronic equipment can be a server, a terminal and the like, is not particularly limited, and is convenient to describe and is called as the electronic equipment for short hereinafter.
According to an aspect of the embodiment of the application, a method embodiment of a time sequence data analysis method based on a data lake is provided. Fig. 1 is a flowchart of a time series data analysis method based on a data lake according to an embodiment of the present application, as shown in fig. 1, the method includes:
step S11, acquiring a target data set to be analyzed, wherein the target data set is acquired from a data lake.
The method provided by the embodiment of the application is applied to data processing equipment, and the data processing equipment can be a smart phone, a computer, an iPad and the like. The data processing device is used for receiving a data analysis request of a user, selecting corresponding original data according to the data analysis request to generate time sequence data, and analyzing a data rule of the time sequence data.
Specifically, the target data set to be analyzed is obtained, which comprises the following steps A1-A3:
and step A1, acquiring metadata information which is input currently, wherein the metadata information comprises at least one data description information.
In an embodiment of the present application, the data processing device may receive a data analysis request triggered by a user, for example: the user clicks an analysis button of the display interface, the data processing device displays a corresponding input interface based on the data analysis request, and metadata information input by the user based on the input interface is obtained. As shown in fig. 2, the data description information included in the metadata information may be: name, data source, timing table, word table, etc., where the name may be a business name and the data source may be a database.
And step A2, inquiring the original data set matched with the data description information from the data lake.
In the embodiment of the application, the process of inquiring the original data set by utilizing the metadata inquiry information is as follows: firstly, determining a data source from a data lake, and acquiring a time sequence table corresponding to the service name from the data source. And secondly, inquiring whether the time sequence table has a plurality of sub-tables, and if so, acquiring screening conditions, wherein the screening conditions are whether data in all the sub-tables are used as an original data set. And finally, selecting a corresponding sub-table from the time sequence table according to the screening condition, and constructing an original data set by utilizing the data in the sub-table.
And step A3, acquiring data analysis conditions, and constructing a target data set by utilizing the original data meeting the data analysis conditions in the original data set.
In the embodiment of the application, the data analysis conditions at least comprise filtering conditions and sampling conditions. The data analysis conditions may be preset by the user based on the data processing apparatus.
In the embodiment of the application, step A3, constructing a target data set by using the original data meeting the data analysis condition in the original data set, comprises the following steps of B1-B3:
and B1, determining the original data meeting the filtering condition in the original data set as candidate data.
In the embodiment of the application, the filtering conditions comprise data filtering category, time filtering range, label filtering data and the like. The data filtering category may include a data category, the time filtering range may include a start time and an end time, and the tag filtering data may indicate an entry in which one or more device tags to be acquired are located, that is, the tag is used to indicate attribute information of the monitored object. Such as the date of the manufacturer, model, etc., of a data collection site, which often does not change over time. A tag consists of a tag ID, which may also be referred to as a tag name, and a tag value.
Step B2, sampling the candidate data according to granularity indicated by sampling conditions to obtain target data;
and B3, constructing the target data set based on target data.
In an embodiment of the present application, the sampling conditions include: a sampling time range, sampling granularity, and a sampling period, wherein the sampling granularity may be expressed as a time granularity between data. For example: the sampling time ranges from 8:00 to 20:00, the sampling period is 1 hour, and the sampling granularity is 10 minutes. Based on the above, in the sampling process, firstly, candidate data falling in a sampling time range is obtained from the candidate data, secondly, the candidate data falling in the sampling range is divided according to a sampling period to obtain a plurality of interval data, and finally, target data is collected from the interval data according to sampling granularity. And finally constructing a target data set based on the target data in each interval data.
And step S12, converting the data in the target data set by using a preset function to obtain time sequence data corresponding to the target data set.
In the embodiment of the application, a user can set a preset function for converting time sequence data based on the data processing equipment, and then convert each target data in the target data set by using the preset function to obtain the time sequence data corresponding to the target data set.
In the embodiment of the present application, after obtaining the time series data, it is further required to determine whether the time series data meets the current analysis conditions of the user, where the analysis conditions include: granularity required by users for time series data, a downsampling algorithm and the like. Specifically, first, granularity of time series data is determined, for example: the time interval between the time sequence data is 10min, granularity required by a user is 30min, at the moment, granularity of the time sequence data is smaller than granularity indicated by the time sequence data, and downsampling operation is needed to be carried out on the time sequence data to obtain target data. In addition, as shown in fig. 3, if a data loss occurs in the down-sampling operation, the user can set the missing data based on the data processing apparatus. And finally obtaining final time sequence data through the data obtained by downsampling the candidate data and the set missing data.
The time series data is data of a series of monitoring indexes continuously generated based on a certain frequency in time. Such as the temperature, power value, etc. of the monitored object, are collected every 1 minute, and a series of data is generated. Such as stock prices, air temperature changes, web site browsing access data, personal health data, industrial sensor data, system monitoring data of a business server, etc., may be time series data. Downsampling is the sampling of multiple monitoring data of a monitoring index in the time dimension for a time series, e.g., average, maximum, etc. of 60 temperatures acquired by the sensor over an hour.
And S13, generating corresponding visual data based on the time sequence data, and analyzing the visual data according to preset analysis conditions to obtain a data rule corresponding to the time sequence data.
In the embodiment of the application, corresponding visual data is generated based on time sequence data, and the method comprises the following steps of C1-C3:
and C1, acquiring attribute information corresponding to the target data set, and inquiring at least one visualization type corresponding to the attribute information.
And C2, acquiring a target visual type matched with a preset visual requirement from at least one visual type.
And C3, generating the visual data according to the target visual type by using the time sequence data.
In the embodiment of the application, in order to facilitate the user to intuitively understand the change condition of the time series data, different visualization types are set for different attribute information, for example: in the case where the attribute information of the target data is hierarchically structured data (e.g., sales data of different product categories), the type of visualization is a clustered bar graph, a bar graph, or the like. In the case where the attribute information of the target data is a conventional numerical value (e.g., web browsing data), the visualization type may be a clustered bar graph or a line graph, or the like.
In the embodiment of the application, the data processing device displays the obtained visualization type of the attribute information corresponding to the target data set, and can also obtain the preset visualization requirement input by the user, for example: the preset visual requirement can be the quantity increase or decrease change, the comparison condition of the data and the like. And when the preset visual requirement is data increasing and decreasing change, the target visual type matched with the data increasing and decreasing change is a line graph. Or under the condition that the preset visual requirement is classified comparison, the target visual type matched with the classified comparison is a clustered column chart.
In the embodiment of the application, the visualized data are analyzed according to the preset analysis conditions to obtain the data rule corresponding to the time sequence data, and the method comprises the following steps D1-D2:
and D1, detecting whether the visual data is stable or not, and obtaining a detection result.
In the embodiment of the application, the process of detecting whether the visual data is stable is as follows: randomly extracting two arbitrary data p and q from the visual data, judging whether the visual data is stable or not by using the following formula,wherein x is a random variable and t is a time. P and q are input into the above formula,if the above formula is true, the visual data is stable, and the detection result is the first result. In contrast, if the above formula is not satisfied, it is indicated that the visual data is not stable, and the detection result is the second result.
And D2, acquiring an analysis strategy corresponding to the detection result, and analyzing the visualized data by utilizing the analysis strategy to obtain a data rule.
In the embodiment of the application, the visualized data is analyzed by utilizing an analysis strategy to obtain a data rule, which comprises the following steps: detecting whether the visual data has white noise data or not under the condition that the detection result is a first result, wherein the first result is used for indicating that the visual data is stable; detecting a stationary type corresponding to the visual data under the condition that the visual data has white noise data, wherein the stationary type comprises: is strictly stable and wide and stable. Finally, the plateau type is determined as a data law.
It should be noted that strictly means: the mean, variance and autocorrelation coefficients of the time series data do not change over time. Wide and steady means: the mean of the time series data may change over time, but the variance and autocorrelation coefficients do not change over time. The method provided by the embodiment of the application can be used for distinguishing strictly and stably from a wide and stable manner through the visualization, so that the strictly and stable curve is in a straight line form from the aspect of the curve, and the wide and stable curve has some fluctuation.
In the embodiment of the application, the visualized data is analyzed by utilizing an analysis strategy to obtain a data rule, which comprises the following steps: and under the condition that the detection result is a second result, performing differential operation based on the visual data to obtain an operation result, wherein the second result is used for indicating that the visual data is stable. The difference operation is to calculate the data at time t minus the data at time t-1, so as to measure the change of the data at time t relative to the time t-1.
Secondly, according to a detection operation result of the preset dimensions, trend information corresponding to each preset dimension is obtained, wherein the preset dimensions comprise: linear dimension, curvilinear dimension, and periodic dimension; and determining trend information corresponding to the preset dimension as a data rule.
In the embodiment of the application, according to the detection operation result of the preset dimensions, the process of obtaining the trend information corresponding to each preset dimension is as follows: firstly, a graph is generated based on time series data, and secondly, the change trend of the curve in the observation icon is detected. For example: if the data is found to have a linear trend, this trend can be characterized by a straight line. If periodic, this trend may be characterized by a sine wave or other periodic function. Finally, it is assessed whether the identified time series data actually has such a specific trend or periodicity by differences and errors between the data points on the image and the plotted function.
In the embodiment of the application, the time sequence is automatically marked in an asynchronous mode by using a mode of observing the sequence characteristics by a computer. The machine observes whether the time sequence is stable or not at first, marks the sequence, and if so, continuously detects whether white noise is generated, if so, marks white noise, and simultaneously detects whether the time sequence is strictly stable or widely stable, and marks corresponding labels. If the sequence is non-stationary, differential operation is carried out, linear trend, curve trend and periodic trend of the sequence are detected, and corresponding labels are marked.
Taking the voltage sampling value of the intelligent ammeter as an example, the voltage value of the common household appliance is set according to the 220V national standard, but the voltage at the end of the appliance can not be controlled accurately, and the voltage can have an up-and-down fluctuation range. We first find the storage location of the voltage data in the data lake. From which meter data for several different regions are sampled for comparison. If the raw data may be sampled once at 15 minutes, we can perform a down-sampling process, such as down to 1 hour of data. By comparing and repeatedly observing the data visualization curves, the method can preliminarily know the approximate voltage conditions of the areas, such as the conditions of the upper extreme value and the lower extreme value, the periodicity law and the like.
According to the method provided by the embodiment of the application, the target data set is automatically converted into the time sequence data, the user can directly configure corresponding analysis conditions, and the time sequence data is analyzed by utilizing the analysis conditions, so that the rule of the time sequence data is generalized and proposed, the data analysis according to the user needs is realized, the analysis is not needed by professional analysts, and the difficulty of the data analysis of the common user is reduced.
Fig. 4 is a block diagram of a time series data analysis device based on a data lake according to an embodiment of the present application, where the device may be implemented as part or all of an electronic device by software, hardware, or a combination of both. As shown in fig. 4, the apparatus includes:
an obtaining module 41, configured to obtain a target data set to be analyzed, where the target data set is obtained from a data lake;
the conversion module 42 is configured to convert data in the target data set by using a preset function to obtain time sequence data corresponding to the target data set;
the generating module 43 is configured to generate corresponding visual data based on the time sequence data, and analyze the visual data according to a preset analysis condition to obtain a data rule corresponding to the time sequence data.
In the embodiment of the present application, the obtaining module 41 is configured to obtain metadata information currently input, where the metadata information includes at least one data description information; inquiring an original data set matched with the data description information from a data lake; and acquiring data analysis conditions, and constructing a target data set by utilizing the original data meeting the data analysis conditions in the original data set.
In the embodiment of the application, the data analysis conditions at least comprise filtering conditions and sampling conditions.
In the embodiment of the present application, the obtaining module 41 is specifically configured to determine, as candidate data, original data that satisfies a filtering condition in the original data set; sampling the candidate data according to granularity indicated by the sampling condition to obtain target data; a target data set is constructed based on the target data.
In the embodiment of the present application, the generating module 43 is configured to obtain attribute information corresponding to the target data set, and query at least one visualization type corresponding to the attribute information; obtaining a target visual type matched with a preset visual requirement from at least one visual type; and generating the visualized data according to the target visualized type by the time sequence data.
In the embodiment of the present application, the generating module 43 is configured to detect whether the visualized data is stable, so as to obtain a detection result; and obtaining an analysis strategy corresponding to the detection result, and analyzing the visualized data by utilizing the analysis strategy to obtain a data rule.
In the embodiment of the present application, the generating module 43 is configured to detect whether white noise data exists in the visual data if the detection result is a first result, where the first result is used to indicate that the visual data is stable; detecting a stable type corresponding to the visual data under the condition that white noise data exists in the visual data; the plateau type is determined as a data law.
In the embodiment of the present application, the generating module 43 is configured to perform a differential operation based on the visual data to obtain an operation result when the detection result is a second result, where the second result is used to indicate that the visual data is stable; obtaining trend information corresponding to each preset dimension according to a detection operation result of the preset dimension, wherein the preset dimension comprises: linear dimension, curvilinear dimension, and periodic dimension; and determining trend information corresponding to the preset dimension as a data rule.
The embodiment of the application also provides an electronic device, as shown in fig. 5, the electronic device may include: the device comprises a processor 1501, a communication interface 1502, a memory 1503 and a communication bus 1504, wherein the processor 1501, the communication interface 1502 and the memory 1503 are in communication with each other through the communication bus 1504.
A memory 1503 for storing a computer program;
the processor 1501, when executing the computer program stored in the memory 1503, implements the steps of the above embodiments.
The communication bus mentioned by the above terminal may be a peripheral component interconnect standard (PeripheralComponent Interconnect, abbreviated as PCI) bus or an extended industry standard architecture (Extended Industry Standard Architecture, abbreviated as EISA) bus, etc. The communication bus may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus.
The communication interface is used for communication between the terminal and other devices.
The memory may include random access memory (Random AccessMemory, RAM) or non-volatile memory (non-volatile memory), such as at least one disk memory. Optionally, the memory may also be at least one memory device located remotely from the aforementioned processor.
The processor may be a general-purpose processor, including a central processing unit (CentralProcessing Unit, CPU for short), a network processor (Network Processor, NP for short), etc.; but also digital signal processors (Digital SignalProcessing, DSP for short), application specific integrated circuits (Application SpecificIntegrated Circuit, ASIC for short), field-programmable gate arrays (Field-ProgrammableGate Array, FPGA for short) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
In yet another embodiment of the present application, a computer readable storage medium is provided, in which instructions are stored, which when run on a computer, cause the computer to perform the time series data analysis method based on the data lake of any one of the above embodiments.
In yet another embodiment of the present application, a computer program product comprising instructions, which when run on a computer, causes the computer to perform the method of data lake-based time series data analysis of any of the above embodiments is also provided.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital subscriber line), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk), etc.
The foregoing description is only of the preferred embodiments of the present application and is not intended to limit the scope of the present application. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application are included in the protection scope of the present application.
The foregoing is only a specific embodiment of the application to enable those skilled in the art to understand or practice the application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (7)

1. A method for analyzing time series data based on a data lake, comprising the steps of:
acquiring a target data set to be analyzed, wherein the target data set is acquired from a data lake;
converting the data in the target data set by using a preset function to obtain time sequence data corresponding to the target data set;
generating corresponding visual data based on the time sequence data, and analyzing the visual data according to preset analysis conditions to obtain a data rule corresponding to the time sequence data;
analyzing the visual data according to preset analysis conditions to obtain a data rule corresponding to the time sequence data, wherein the method comprises the following steps:
detecting whether the visual data is stable or not, and obtaining a detection result;
acquiring an analysis strategy corresponding to the detection result, and analyzing the visual data by utilizing the analysis strategy to obtain the data rule;
the detecting whether the visual data is stable or not to obtain a detection result comprises the following steps:
randomly extracting two arbitrary data p and q from the visual data, judging whether the visual data is stable or not by using a preset formula,
the preset formula is: [ x ] 1+p (t),x 2+p (t),...,x x+p (t)] q =[x 1 (t),x 2 (t),...,x t (t)]Wherein x is a random variable and t is a time;
inputting p and q into a preset formula, and if the preset formula is established, indicating that the visual data is stable; if the preset formula is not satisfied, the visual data is not stable;
the step of analyzing the visual data by using the analysis strategy to obtain the data rule comprises the following steps:
detecting whether white noise data exists in the visual data or not under the condition that the detection result is a first result, wherein the first result is used for indicating that the visual data is stable;
detecting a stable type corresponding to the visual data under the condition that white noise data exists in the visual data;
determining the plateau type as the data law;
the step of analyzing the visual data by using the analysis strategy to obtain the data rule comprises the following steps:
under the condition that the detection result is a second result, performing differential operation based on the visual data to obtain an operation result, wherein the second result is used for indicating that the visual data is stable;
detecting the operation result according to preset dimensions to obtain trend information corresponding to each preset dimension, wherein the preset dimensions comprise: linear dimension, curvilinear dimension, and periodic dimension;
and determining trend information corresponding to the preset dimension as the data rule.
2. The method of claim 1, wherein the acquiring the set of target data to be analyzed comprises:
acquiring metadata information which is input currently, wherein the metadata information comprises at least one piece of data description information;
querying an original data set matched with the data description information from the data lake;
and acquiring data analysis conditions, and constructing the target data set by utilizing the original data meeting the data analysis conditions in the original data set.
3. The method of claim 2, wherein the data analysis conditions include at least a filtering condition and a sampling condition;
the constructing the target data set by using the original data meeting the data analysis condition in the original data set includes:
determining the original data meeting the filtering condition in the original data set as candidate data;
sampling the candidate data according to granularity indicated by the sampling condition to obtain target data;
the target data set is constructed based on the target data.
4. The method of claim 1, wherein the generating the respective visualization data based on the time series data comprises:
acquiring attribute information corresponding to the target data set, and inquiring at least one visualization type corresponding to the attribute information;
obtaining a target visual type matched with a preset visual requirement from at least one visual type;
and generating the visual data according to the target visual type by the time sequence data.
5. A time series data analysis device based on a data lake, comprising:
the acquisition module is used for acquiring a target data set to be analyzed, wherein the target data set is acquired from a data lake;
the conversion module is used for converting the data in the target data set by utilizing a preset function to obtain time sequence data corresponding to the target data set;
the generation module is used for generating corresponding visual data based on the time sequence data, and analyzing the visual data according to preset analysis conditions to obtain a data rule corresponding to the time sequence data;
the generation module is used for detecting whether the visual data is stable or not and obtaining a detection result; acquiring an analysis strategy corresponding to the detection result, and analyzing the visual data by utilizing the analysis strategy to obtain the data rule;
wherein the generation module is specifically configured to randomly extract two arbitrary data p and q from the visualized data, determine whether the visualized data is stable by using a preset formula,
the preset formula is: [ x ] 1+p (t),x 2+p (t),...,x x+p (t)] q =[x 1 (t),x 2 (t),...,x t (t)]Wherein x is a random variable and t is a time;
inputting p and q into a preset formula, and if the preset formula is established, indicating that the visual data is stable; if the preset formula is not satisfied, the visual data is not stable;
the generation module is specifically configured to detect whether white noise data exists in the visual data when the detection result is a first result, where the first result is used to indicate that the visual data is stable; detecting a stable type corresponding to the visual data under the condition that white noise data exists in the visual data; determining the plateau type as the data law;
the generating module is specifically configured to perform a differential operation based on the visual data to obtain an operation result when the detection result is a second result, where the second result is used to indicate that the visual data is stable; detecting the operation result according to preset dimensions to obtain trend information corresponding to each preset dimension, wherein the preset dimensions comprise: linear dimension, curvilinear dimension, and periodic dimension; and determining trend information corresponding to the preset dimension as the data rule.
6. A storage medium comprising a stored program, wherein the program when run performs the method of any one of claims 1 to 4.
7. The electronic equipment is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus; wherein:
a memory for storing a computer program;
a processor for performing the method of any one of claims 1 to 4 by running a program stored on a memory.
CN202310166499.2A 2023-02-27 2023-02-27 Time sequence data analysis method and device based on data lake Active CN115858633B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310166499.2A CN115858633B (en) 2023-02-27 2023-02-27 Time sequence data analysis method and device based on data lake

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310166499.2A CN115858633B (en) 2023-02-27 2023-02-27 Time sequence data analysis method and device based on data lake

Publications (2)

Publication Number Publication Date
CN115858633A CN115858633A (en) 2023-03-28
CN115858633B true CN115858633B (en) 2023-10-20

Family

ID=85658994

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310166499.2A Active CN115858633B (en) 2023-02-27 2023-02-27 Time sequence data analysis method and device based on data lake

Country Status (1)

Country Link
CN (1) CN115858633B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116804993B (en) * 2023-08-22 2023-12-08 北京龙德缘电力科技发展有限公司 Visual expression method with time sequence data characteristics

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112084056A (en) * 2020-08-25 2020-12-15 腾讯科技(深圳)有限公司 Abnormality detection method, apparatus, device and storage medium
CN113722383A (en) * 2021-09-13 2021-11-30 福韵数据服务有限公司 Investigation device and method based on time sequence information
WO2022083684A1 (en) * 2020-10-23 2022-04-28 北京千方科技股份有限公司 Road network operation management method and device, storage medium, and terminal
CN114969191A (en) * 2021-11-24 2022-08-30 广州城建职业学院 Data analysis method, system and device based on big data and storage medium
CN115544183A (en) * 2022-11-28 2022-12-30 深圳高灯计算机科技有限公司 Data visualization method and device, computer equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112084056A (en) * 2020-08-25 2020-12-15 腾讯科技(深圳)有限公司 Abnormality detection method, apparatus, device and storage medium
WO2022083684A1 (en) * 2020-10-23 2022-04-28 北京千方科技股份有限公司 Road network operation management method and device, storage medium, and terminal
CN113722383A (en) * 2021-09-13 2021-11-30 福韵数据服务有限公司 Investigation device and method based on time sequence information
CN114969191A (en) * 2021-11-24 2022-08-30 广州城建职业学院 Data analysis method, system and device based on big data and storage medium
CN115544183A (en) * 2022-11-28 2022-12-30 深圳高灯计算机科技有限公司 Data visualization method and device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN115858633A (en) 2023-03-28

Similar Documents

Publication Publication Date Title
US8751436B2 (en) Analyzing data quality
CN106021337A (en) A big data analysis-based intelligent recommendation method and system
CN115858633B (en) Time sequence data analysis method and device based on data lake
WO2019023982A1 (en) Multi-dimensional industrial knowledge graph
CN116431931B (en) Real-time incremental data statistical analysis method
CN112148733A (en) Method, device, electronic device and computer readable medium for determining fault type
CN114461644A (en) Data acquisition method and device, electronic equipment and storage medium
CN111914192A (en) Method and device for displaying equipment data
CN110046235B (en) Knowledge base assessment method, device and equipment
US20240013280A1 (en) Product recommendation method and apparatus, computer storage medium, and system
CN102360484B (en) Group buying websites sales data verity detection method and device
CN113781106A (en) Commodity operation data analysis method, device, equipment and computer readable medium
US11308104B2 (en) Knowledge graph-based lineage tracking
CN112631889A (en) Portrayal method, device and equipment for application system and readable storage medium
CN113077321A (en) Article recommendation method and device, electronic equipment and storage medium
CA3051919C (en) Machine learning (ml) based expansion of a data set
CN107256254A (en) A kind of Industrial Cycle index acquisition methods, storage device and terminal
US20090132522A1 (en) Systems and methods for organizing innovation documents
CN115641191A (en) Data pushing method based on data analysis and AI system
CN111767938B (en) Abnormal data detection method and device and electronic equipment
CN116108086B (en) Time sequence data evaluation method and device, electronic equipment and storage medium
CN112598185A (en) Agricultural public opinion analysis method, device, equipment and storage medium
CN113568950A (en) Index detection method, device, equipment and medium
JP5665501B2 (en) Information processing apparatus and program
Lai et al. Reduction of control-chart signal variablity for high-quality processes

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant