CN115408390A - A data processing method, device and electronic equipment - Google Patents
A data processing method, device and electronic equipment Download PDFInfo
- Publication number
- CN115408390A CN115408390A CN202211021925.5A CN202211021925A CN115408390A CN 115408390 A CN115408390 A CN 115408390A CN 202211021925 A CN202211021925 A CN 202211021925A CN 115408390 A CN115408390 A CN 115408390A
- Authority
- CN
- China
- Prior art keywords
- target
- data
- file
- period
- hash
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
- G06F16/2255—Hash tables
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2474—Sequence data queries, e.g. querying versioned data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/285—Clustering or classification
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- Fuzzy Systems (AREA)
- Mathematical Physics (AREA)
- Probability & Statistics with Applications (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
技术领域technical field
本申请涉及数据处理技术,特别涉及一种数据处理方法、装置及电子设备。The present application relates to data processing technology, in particular to a data processing method, device and electronic equipment.
背景技术Background technique
对于具有实时监控需求的各类工业设备而言,其在运行过程中产生并记录的数据对于状态监测、在线诊断和离线分析均具有非常重要的价值。For all kinds of industrial equipment with real-time monitoring requirements, the data generated and recorded during operation are of great value for status monitoring, online diagnosis and offline analysis.
目前常用的数据存储方法包含关系型数据库(如MySQL)、非关系型数据库(如MongoDB)、时间序列数据库(如TSDB)、文本文件(如CSV文件),等等。Currently commonly used data storage methods include relational databases (such as MySQL), non-relational databases (such as MongoDB), time series databases (such as TSDB), text files (such as CSV files), and so on.
然而,随着工业控制技术的发展,工业设备运行时受监测的变量数量不断增多、数据采样频率不断增加,对于海量数据的存储提出了严峻的挑战,例如针对采样周期在毫秒级别或更短的高频数据,单个变量每天可产生数万至数百万条数据、每年产生的数据量级可以达到亿级。However, with the development of industrial control technology, the number of monitored variables and the frequency of data sampling are increasing continuously during the operation of industrial equipment, which poses a severe challenge to the storage of massive data, such as for the sampling period of milliseconds or shorter For high-frequency data, a single variable can generate tens of thousands to millions of pieces of data every day, and the magnitude of data generated every year can reach hundreds of millions.
上述传统的数据存储方式在存储效率和访问速度等方面均无法满足大数据量场景下的生产实践要求。The above-mentioned traditional data storage methods cannot meet the production practice requirements in large data volume scenarios in terms of storage efficiency and access speed.
发明内容Contents of the invention
本申请实施例提供了一种数据处理方法、装置及电子设备,以解决现有数据存储方案对于高频变量存储困难的问题。Embodiments of the present application provide a data processing method, device, and electronic equipment to solve the problem of difficulty in storing high-frequency variables in existing data storage solutions.
第一方面,本申请实施例提供了一种数据处理方法,包括:In the first aspect, the embodiment of the present application provides a data processing method, including:
获取目标变量在目标时段和目标周期下的目标数据;Obtain the target data of the target variable in the target time period and target period;
至少依据上述目标时段、上述目标周期、上述目标变量的变量标识,通过目标哈希算法确定目标文件的哈希标识和存储路径;Determine the hash identifier and storage path of the target file through the target hash algorithm at least according to the above target time period, the above target period, and the variable identification of the above target variable;
在上述存储路径下根据上述哈希标识创建上述目标文件,将上述目标数据写入上述目标文件。The above-mentioned target file is created according to the above-mentioned hash identifier under the above-mentioned storage path, and the above-mentioned target data is written into the above-mentioned target file.
在一可能的实现方式中,上述目标变量在目标时段和目标周期下的目标数据,包括:In a possible implementation, the target data of the above-mentioned target variables in the target time period and target period include:
上述目标变量在目标时段和目标采样周期下的目标采样数据,和/或,上述目标变量在目标时段和目标统计周期下的目标统计数据;The target sampling data of the above-mentioned target variable in the target time period and target sampling period, and/or, the target statistical data of the above-mentioned target variable in the target time period and target statistical period;
其中,上述目标统计周期大于上述目标采样周期,上述目标统计数据为基于上述目标统计周期对采样数据或其它统计数据进行统计以获得的数据。Wherein, the above-mentioned target statistical period is greater than the above-mentioned target sampling period, and the above-mentioned target statistical data is data obtained by performing statistics on sampling data or other statistical data based on the above-mentioned target statistical period.
在一可能的实现方式中,上述至少依据上述目标时段、上述目标周期、上述目标变量的变量标识,通过目标哈希算法确定目标文件的哈希标识和存储路径,包括:In a possible implementation, at least according to the above-mentioned target time period, the above-mentioned target cycle, and the variable identification of the above-mentioned target variable, the hash identification and storage path of the target file are determined through the target hash algorithm, including:
依据上述目标数据的数据类型、上述目标时段、上述目标周期、上述目标变量的变量标识,通过目标哈希算法确定目标文件的哈希标识和存储路径。According to the data type of the above-mentioned target data, the above-mentioned target time period, the above-mentioned target cycle, and the variable identification of the above-mentioned target variable, the hash identification and storage path of the target file are determined through the target hash algorithm.
在一可能的实现方式中,上述至少依据上述目标时段、上述目标周期、上述目标变量的变量标识,通过目标哈希算法确定目标文件的哈希标识和存储路径,包括:In a possible implementation, at least according to the above-mentioned target time period, the above-mentioned target cycle, and the variable identification of the above-mentioned target variable, the hash identification and storage path of the target file are determined through the target hash algorithm, including:
至少依据上述目标时段、上述目标周期、上述目标变量的变量标识,通过目标哈希算法确定与上述目标数据匹配的哈希值;At least according to the above-mentioned target period, the above-mentioned target period, and the variable identification of the above-mentioned target variable, determine the hash value matching the above-mentioned target data through the target hash algorithm;
从上述哈希值中选取预设位数的字符作为上述目标文件的哈希标识;Select a character with a preset number of digits from the above hash value as the hash identifier of the above target file;
对上述哈希标识的字符进行划分,在预设的多级文件目录中查询与划分结果匹配的存储位置,确定上述目标文件的存储路径。Divide the characters identified by the above hash, query the storage location matching the division result in the preset multi-level file directory, and determine the storage path of the above target file.
在一可能的实现方式中,上述将上述目标数据写入上述目标文件,包括:In a possible implementation manner, the above-mentioned writing of the above-mentioned target data into the above-mentioned target file includes:
将上述目标数据中的数据点依据时间顺序以二进制格式依次写入上述目标文件,其中,上述目标文件中数据点的二进制存储位数为固定位数;Writing the data points in the above target data into the above target file sequentially in binary format according to time sequence, wherein the binary storage digits of the data points in the above target file are fixed digits;
若存在数据缺失的故障数据点,则基于插值算法或预设的特殊值将上述故障数据点占位为上述固定位数。If there is a faulty data point with missing data, the above-mentioned faulty data point is occupied by the above-mentioned fixed number of digits based on an interpolation algorithm or a preset special value.
在一可能的实现方式中,上述方法还包括:In a possible implementation, the above method also includes:
当接收到用于对上述目标变量进行数据查询的查询指令时,依据上述查询指令中携带的目标查询时段、目标查询周期、变量标识,通过上述目标哈希算法确定一个或多个待查询文件的哈希标识和存储路径;When a query instruction for data query of the above-mentioned target variable is received, according to the target query period, target query period, and variable identification carried in the above-mentioned query instruction, determine the number of one or more files to be queried through the above-mentioned target hash algorithm Hash ID and storage path;
基于上述哈希标识和上述存储路径查询上述一个或多个待查询文件,获取上述目标变量在上述目标查询时段和目标查询周期下的待查询数据。Query the one or more files to be queried based on the hash identifier and the storage path, and obtain the data to be queried of the target variable in the target query period and the target query period.
在一可能的实现方式中,上述依据上述查询指令中携带的目标查询时段、目标查询周期、变量标识,通过目标哈希算法确定一个或多个待查询文件的哈希标识和存储路径,包括:In a possible implementation, the above-mentioned target query period, target query period, and variable identification carried in the above-mentioned query instruction are used to determine the hash ID and storage path of one or more files to be queried through the target hash algorithm, including:
基于与上述目标查询周期匹配的时段划分规则,依据上述查询指令中携带的上述目标查询时段确定一个或多个实际查询时段;One or more actual query periods are determined according to the above target query period carried in the above query instruction based on the period division rule matching the above target query period;
基于上述一个或多个实际查询时段、上述查询指令中携带的目标查询周期、变量标识,通过目标哈希算法确定一个或多个待查询文件的哈希标识和存储路径;其中,每个实际查询时段与一个待查询文件的目标时段相匹配。Based on the above-mentioned one or more actual query periods, the target query cycle carried in the above-mentioned query instructions, and the variable identification, the hash identification and storage path of one or more files to be queried are determined through the target hash algorithm; wherein, each actual query The period matches the target period of a file to be queried.
在一可能的实现方式中,上述目标文件和上述待查询文件中还包含用于进行数据自描述的文件头信息,上述文件头信息中包含变量标识、目标时段、目标周期、加密算法类型、压缩算法类型中的至少一项;外部设备中存有用于对上述目标文件进行数据校验的校验信息;In a possible implementation, the above-mentioned target file and the above-mentioned file to be queried also include file header information for data self-description, and the above-mentioned file header information includes variable identification, target time period, target period, encryption algorithm type, compression At least one of the algorithm types; verification information for data verification of the above-mentioned target files is stored in the external device;
上述基于上述哈希标识和上述存储路径查询上述一个或多个待查询文件后,上述方法还包括:After the above-mentioned one or more files to be queried are queried based on the above-mentioned hash identifier and the above-mentioned storage path, the above-mentioned method also includes:
获取上述外部设备中的上述校验信息,判断上述校验信息与上述待查询文件中的上述文件头信息是否匹配;Obtaining the above verification information in the above external device, and judging whether the above verification information matches the above file header information in the above file to be queried;
若匹配,则确定上述待查询文件不存在数据异常,获取上述目标变量在上述目标查询时段和目标查询周期下的待查询数据;If it matches, it is determined that there is no data abnormality in the above-mentioned file to be queried, and the data to be queried under the above-mentioned target query period and target query period of the above-mentioned target variable is obtained;
若不匹配,则确定上述待查询文件存在数据异常,返回错误提示。If they do not match, it is determined that there is a data exception in the above-mentioned file to be queried, and an error message is returned.
第二方面,本申请实施例提供了一种数据处理装置,包括:In a second aspect, the embodiment of the present application provides a data processing device, including:
数据获取单元,用于获取目标变量在目标时段和目标周期下的目标数据;The data acquisition unit is used to acquire the target data of the target variable in the target time period and target period;
哈希确定单元,用于至少依据上述目标时段、上述目标周期、上述目标变量的变量标识,通过目标哈希算法确定目标文件的哈希标识和存储路径;A hash determination unit, configured to determine the hash identifier and storage path of the target file through a target hash algorithm at least according to the above target time period, the above target cycle, and the variable identification of the above target variable;
文件创建单元,用于在上述存储路径下根据上述哈希标识创建上述目标文件,将上述目标数据写入上述目标文件。A file creation unit, configured to create the above-mentioned target file according to the above-mentioned hash identifier under the above-mentioned storage path, and write the above-mentioned target data into the above-mentioned target file.
第三方面,本申请实施例还提供了一种电子设备,该电子设备包括:处理器和机器可读存储介质;In a third aspect, the embodiment of the present application further provides an electronic device, the electronic device includes: a processor and a machine-readable storage medium;
上述机器可读存储介质存储有能够被上述处理器执行的机器可执行指令;The machine-readable storage medium stores machine-executable instructions that can be executed by the processor;
上述处理器用于执行机器可执行指令,以实现上述公开的方法步骤。The above-mentioned processor is used to execute machine-executable instructions to realize the steps of the methods disclosed above.
由以上技术方案可以看出,本实施例基于与目标数据对应的目标变量的变量标识、目标周期、目标时段,采用哈希摘要算法确定对应的哈希标识和存储路径,从而在相应路径下创建对应的目标文件并写入目标数据,实现数据存储。在本实施例中,可以依据哈希算法将不同变量、不同时段、不同周期下的数据分别独立存储在不同的文件中,从而充分利用了高频数据采集和查询为等时间间隔、针对特定时段进行的特点,可以有效提高存储空间利用率,便于后续依据变量、周期、时段等信息进行有效数据比更高的数据查询,相较于传统的数据存储方式能够有效满足大数据量场景下的设备数据存储需求。It can be seen from the above technical solutions that this embodiment uses the hash digest algorithm to determine the corresponding hash identifier and storage path based on the variable identifier, target cycle, and target time period of the target variable corresponding to the target data, thereby creating Corresponding target file and write target data to realize data storage. In this embodiment, the data of different variables, different time periods, and different periods can be stored independently in different files according to the hash algorithm, thus making full use of the high-frequency data collection and query at equal time intervals and for specific time periods The characteristics of storage can effectively improve the utilization rate of storage space, and facilitate subsequent data query with higher effective data ratio based on variables, periods, time periods and other information. Compared with traditional data storage methods, it can effectively meet the needs of devices in large data volume scenarios. Data storage requirements.
附图说明Description of drawings
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本公开的实施例,并与说明书一起用于解释本公开的原理。The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description serve to explain the principles of the disclosure.
图1为本申请实施例提供的方法流程图;Fig. 1 is the flow chart of the method provided by the embodiment of the present application;
图2为本申请实施例提供的目录存储结构示意图;FIG. 2 is a schematic diagram of a directory storage structure provided by an embodiment of the present application;
图3为本申请实施例提供的数据文件结构示意图;FIG. 3 is a schematic diagram of the data file structure provided by the embodiment of the present application;
图4为数据存储性能比对示意图;Figure 4 is a schematic diagram of data storage performance comparison;
图5为数据查询性能比对示意图;Figure 5 is a schematic diagram of data query performance comparison;
图6本申请实施例提供的装置结构图;Figure 6 is a structural diagram of the device provided by the embodiment of the present application;
图7为本申请实施例提供的电子设备结构图。FIG. 7 is a structural diagram of an electronic device provided by an embodiment of the present application.
具体实施方式Detailed ways
这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本申请相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本申请的一些方面相一致的装置和方法的例子。Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numerals in different drawings refer to the same or similar elements unless otherwise indicated. The implementations described in the following exemplary embodiments do not represent all implementations consistent with this application. Rather, they are merely examples of apparatuses and methods consistent with aspects of the present application as recited in the appended claims.
在本申请使用的术语是仅仅出于描述特定实施例的目的,而非旨在限制本申请。在本申请和所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式,除非上下文清楚地表示其他含义。The terminology used in this application is for the purpose of describing particular embodiments only, and is not intended to limit the application. As used in this application and the appended claims, the singular forms "a", "the", and "the" are intended to include the plural forms as well, unless the context clearly dictates otherwise.
为了使本领域技术人员更好地理解本申请实施例提供的技术方案,并使本申请实施例的上述目的、特征和优点能够更加明显易懂,下面结合附图对本申请实施例中技术方案作进一步详细的说明。In order to enable those skilled in the art to better understand the technical solutions provided by the embodiments of the present application, and to make the above-mentioned purposes, features and advantages of the embodiments of the present application more obvious and easy to understand, the following describes the technical solutions in the embodiments of the present application in conjunction with the accompanying drawings Further detailed instructions.
在工业控制领域中,需要在工业设备运行过程中对其各项变量进行数据采集,以用于实时状态监测、在线诊断、离线分析等各类用途。这里的工业设备包含风力发电机、化学反应釜、变压器等任意需要监测或采集运行数据的设备,其需采集的变量可以为风速、转速、油温、液压、功率等任意工业生产控制或设备检测过程中所需的变量。In the field of industrial control, it is necessary to collect data on various variables of industrial equipment during the operation process for real-time status monitoring, online diagnosis, offline analysis and other purposes. The industrial equipment here includes wind turbines, chemical reactors, transformers, and other equipment that need to monitor or collect operating data. The variables to be collected can be any industrial production control or equipment detection such as wind speed, speed, oil temperature, hydraulic pressure, and power. Variables required in the process.
通常将采样周期在毫秒级别或更短的数据称为高频数据、将需要进行高频数据采集的变量称为高频变量。以风力发电机为例:单台风机需要采集的20ms分辨率的高频变量点位个数可达1000个以上,其每天需要存储的数据量为50个/秒*86400秒/天*1000点/台=43.2亿/天·台,如果高频数据的存储周期为180天,则在一个存储周期内单台风机需要存储的数据量将达到43.2亿/天*180天=7776亿,以每个数据4个字节(单精度浮点数)为例计算则总存储量将达到7776亿*4=3.11万亿字节,即3.11TB。Usually, data whose sampling period is at the millisecond level or shorter is called high-frequency data, and variables that require high-frequency data collection are called high-frequency variables. Take wind power generators as an example: the number of high-frequency variable points with 20ms resolution that a single wind turbine needs to collect can reach more than 1,000, and the amount of data that needs to be stored every day is 50/second*86400 seconds/day*1000 points /unit=4.32 billion/day·unit, if the storage period of high-frequency data is 180 days, the amount of data that a single wind turbine needs to store in a storage period will reach 4.32 billion/day*180 days=777.6 billion. Taking 4 bytes of data (single-precision floating-point number) as an example, the total storage capacity will reach 777.6 billion*4=3.11 trillion bytes, or 3.11TB.
而目前常用的数据处理方法在上述大数据量存储场景下则存在若干缺陷,例如:关系型数据库的存储结构难以支持万亿级数据的有效存储和访问,且针对每条记录需要存储ID和时标等额外字段导致加上索引后总存储量远超数据本身容量;非关系型数据库需要以内存作为主要缓存介质,当数据量(如TB级)超过可用内存(如GB级)后性能急剧下降,且同样存在引入额外字段导致存在有效数据占比较低的问题;而时序数据库虽然针对时序数据的通用优化可以大幅提升性能,但也未能利用高频数据的自身特点和业务需求进行定制,存储效率和访问性能依然难以达到业务需求,且对硬件设备的要求较高;文本文件虽然具有用户可读的优势,但其代价为存储效率和访问性能远不如前3类数据库,仅可用于小数据量情况下的数据交换,等等。However, the current commonly used data processing methods have some defects in the above-mentioned large data storage scenarios. For example, the storage structure of relational databases is difficult to support the effective storage and access of trillions of data, and each record needs to store ID and time. Standardized extra fields cause the total storage capacity after adding indexes to far exceed the capacity of the data itself; non-relational databases need to use memory as the main cache medium, and when the amount of data (such as TB level) exceeds the available memory (such as GB level), the performance drops sharply , and there is also the problem that the introduction of additional fields leads to a low proportion of valid data; while the general optimization of time-series databases for time-series data can greatly improve performance, but it also fails to take advantage of the characteristics of high-frequency data and business needs to customize, store Efficiency and access performance are still difficult to meet business needs, and the requirements for hardware devices are high; although text files have the advantage of being readable by users, the cost is that storage efficiency and access performance are far inferior to the first three types of databases, and can only be used for small data Data exchange under high volume conditions, etc.
对此,本申请实施例提供了一种数据处理方法以解决上述问题。参见图1,图1为本申请实施例提供的方法流程图。该流程中涉及的目标变量的采样周期可以为毫秒级,也可以为更高或更低的任意采样级别,本实施例对此不进行限定。In this regard, an embodiment of the present application provides a data processing method to solve the above problem. Referring to FIG. 1 , FIG. 1 is a flowchart of a method provided by an embodiment of the present application. The sampling period of the target variable involved in the process may be at millisecond level, or at any higher or lower sampling level, which is not limited in this embodiment.
在本实施例中,上述数据处理方法流程中涉及的数据可以来源于包括风力发电机、化学反应釜等在内的任意需要进行数据采集及存储的工业生产设备或其它具有数据存储需求的设备;获得的上述数据可以存储于个人电脑、服务器、嵌入式工控设备、手机等任意电子设备,本实施例对此不进行限定。In this embodiment, the data involved in the flow of the above data processing method may come from any industrial production equipment that requires data collection and storage, including wind power generators, chemical reactors, etc., or other equipment with data storage requirements; The above obtained data may be stored in any electronic device such as a personal computer, a server, an embedded industrial control device, a mobile phone, etc., which is not limited in this embodiment.
如图1所示,该流程可包括以下步骤:As shown in Figure 1, the process may include the following steps:
步骤101,获取目标变量在目标时段和目标周期下的目标数据。In step 101, the target data of the target variable in the target time period and target period are acquired.
在本实施例中,为针对高频数据通常可以基于变量、时段、间隔(即采样周期)进行独立查询,且,数据查询与采集一般为等时间间隔的特点进行数据存储,故需要先确定获取的目标数据的变量、时段、周期信息,即,获取目标变量在目标时段和目标周期下的目标数据用于生成后续目标文件进行数据存储。In this embodiment, in order to conduct independent query based on variables, time periods, and intervals (i.e., sampling periods) for high-frequency data, and data query and collection are generally equal time intervals for data storage, it is necessary to first determine the acquisition The variable, time period, and period information of the target data, that is, the target data of the target variable in the target period and target period are obtained to generate subsequent target files for data storage.
其中,上述目标时段可以为与上述目标数据对应的时间范围,即从上述目标数据中第一个数据点对应的时刻到最后一个数据点对应的时刻,如某年某月某日的10:00~11:00、12:05:00~12:05:59等;上述目标周期可以为目标数据中相邻两个数据点之间对应的时间间隔,且在同一组目标数据内该时间间隔可以为固定值,如20毫秒、1分钟、1小时等。Wherein, the above-mentioned target time period can be the time range corresponding to the above-mentioned target data, that is, from the time corresponding to the first data point in the above-mentioned target data to the time corresponding to the last data point, such as 10:00 on a certain day of a certain year ~11:00, 12:05:00~12:05:59, etc.; the above-mentioned target period can be the corresponding time interval between two adjacent data points in the target data, and the time interval in the same group of target data can be It is a fixed value, such as 20 milliseconds, 1 minute, 1 hour, etc.
作为一个优选的实施例,上述目标变量在目标时段和目标周期下的目标数据,具体可以为:上述目标变量在目标时段和目标采样周期下的目标采样数据,和/或,上述目标变量在目标时段和目标统计周期下的目标统计数据。As a preferred embodiment, the target data of the above-mentioned target variable in the target time period and target period may specifically be: the target sampling data of the above-mentioned target variable in the target period and target sampling period, and/or, the above-mentioned target variable in the target Target stats under time period and target statistical period.
其中,采样数据可以为直接依据一定采样周期进行数据采样所得到的原始数据;而统计数据则可以是基于统计周期对采样数据或其它统计数据进行统计以获得的数据,上述目标统计周期大于上述目标采样周期。Among them, the sampling data can be the original data directly obtained by sampling data according to a certain sampling period; the statistical data can be the data obtained by statistically collecting sampling data or other statistical data based on the statistical period, and the above-mentioned target statistical period is greater than the above-mentioned target The sampling period.
例如在某风电机中以20ms的目标采样周期进行数据采样,则可以获得目标采样周期为20ms的目标采样数据;在此基础上,可以继续进行统计以获得统计数据,例如可以从上述目标采样数据中选取连续的3000个数据点,确定每相邻的50个数据点的最大值,以得到60个数据点,即可作为目标统计周期为20ms*50=1s的目标统计数据;进一步地,同理可以对这60个数据点继续进行最大值统计,以进一步获得目标统计周期为1s*60=1min的目标统计数据,等等;其中,上述最大值可以同理替换为任意统计值,如最小值、平均值、峰谷差、标准差、终值,等等,本实施例对此不进行限定。For example, if data sampling is performed with a target sampling period of 20ms in a wind turbine, the target sampling data with a target sampling period of 20ms can be obtained; on this basis, statistics can be continued to obtain statistical data, for example, data can be sampled from the above target Select continuous 3000 data points, determine the maximum value of every adjacent 50 data points to obtain 60 data points, which can be used as the target statistical data with a target statistical period of 20ms*50=1s; further, the same It is reasonable to continue to perform maximum statistics on these 60 data points to further obtain target statistical data with a target statistical period of 1s*60=1min, etc.; wherein, the above-mentioned maximum value can be replaced by any statistical value in the same way, such as the minimum value, average value, peak-to-valley difference, standard deviation, final value, etc., which are not limited in this embodiment.
由于高频数据的查询通常是针对指定时间间隔进行的,例如查询粗粒度为分钟级的统计数据时无需访问毫秒级的原始采样数据、查询粗粒度为小时级的统计数据时无需访问分钟级的统计数据等,因此在数据存储时同步存储不同周期下的统计数据有利于提升后续数据查询效率、降低硬件要求、优化用户体验。Since the query of high-frequency data is usually performed for a specified time interval, for example, there is no need to access the millisecond-level raw sampling data when querying the coarse-grained statistical data at the minute level, and there is no need to access the minute-level sampling data when querying the coarse-grained statistical data at the hour level. Statistical data, etc. Therefore, synchronously storing statistical data in different cycles during data storage is conducive to improving the efficiency of subsequent data queries, reducing hardware requirements, and optimizing user experience.
步骤102,至少依据上述目标时段、上述目标周期、上述目标变量的变量标识,通过目标哈希算法确定目标文件的哈希标识和存储路径。Step 102: Determine the hash identifier and storage path of the target file through a target hash algorithm at least according to the target time period, the target period, and the variable identifier of the target variable.
在本实施例中,当获取上述目标数据后,至少需要基于该目标数据的时段、周期、变量标识,采用预设的哈希算法生成与目标数据对应的哈希标识,以用于后续依据该哈希标识确定存储路径及存储文件的名称等。In this embodiment, after the above-mentioned target data is obtained, at least based on the time period, period, and variable identification of the target data, a preset hash algorithm needs to be used to generate a hash identification corresponding to the target data for subsequent use based on the target data. The hash identifier determines the storage path and the name of the storage file, etc.
具体地,可以通过目标哈希算法确定与上述目标数据匹配的哈希值,进而从该哈希值中选取预设位数的字符作为上述目标文件的哈希标识,以用于后续对哈希标识的字符进行划分,在预设的多级文件目录中查询与划分结果匹配的存储位置,确定上述目标文件的存储路径;多级文件目录相关内容将结合后文中图2相关内容具体给出,这里暂不赘述。Specifically, the hash value that matches the above target data can be determined through the target hash algorithm, and then a character with a preset number of digits is selected from the hash value as the hash identifier of the above target file for subsequent hashing The identified characters are divided, and the storage location matching the division result is queried in the preset multi-level file directory to determine the storage path of the above-mentioned target file; the relevant content of the multi-level file directory will be given in detail in conjunction with the relevant content in Figure 2 in the following text, I won't go into details here.
例如,当某目标数据的目标时段为2020-08-14 17:00:00~2020-08-14 18:00:00,目标周期为50ms,变量标识为1008,当采用的预设哈希算法为MD5算法时,则计算上述哈希标识的方式可以为:MD5(“1008#2020081417#50ms”)=CBC0F1300ECA444B8C30764463E368D9。For example, when the target time period of a certain target data is 2020-08-14 17:00:00~2020-08-14 18:00:00, the target period is 50ms, and the variable ID is 1008, when the preset hash algorithm used When the MD5 algorithm is used, the method of calculating the above-mentioned hash identifier can be: MD5("1008#2020081417#50ms")=CBC0F1300ECA444B8C30764463E368D9.
其中,“1008、”“2020081417”、“50ms”分别对应于上述目标变量的变量标识、目标时段、目标周期,其具体采用的格式、分隔符、排列顺序等内容可以任意调整,本实施例对此不进行限定;以及,可以将哈希计算结果直接确定为上述哈希标识,也可以选取其中的指定部分位数(如取前十位:CBC0F1300E)作为上述哈希标识,本实施例对此不进行限定。上述预设哈希算法可以为MD5、SHA-1、SHA2、RIPEMD-160等任意哈希摘要算法,确保生成哈希标识时所用的哈希算法与后续对存储的数据文件进行查询时所采用的哈希算法为相同的目标哈希算法即可。Among them, "1008," "2020081417" and "50ms" respectively correspond to the variable identification, target time period, and target period of the above-mentioned target variable, and the specific format, separator, and arrangement order of the above-mentioned target variable can be adjusted arbitrarily. This is not limited; and, the hash calculation result can be directly determined as the above-mentioned hash identifier, and the specified number of digits (such as taking the first ten digits: CBC0F1300E) can also be selected as the above-mentioned hash identifier. Not limited. The above preset hash algorithm can be any hash digest algorithm such as MD5, SHA-1, SHA2, RIPEMD-160, etc., to ensure that the hash algorithm used when generating the hash identifier is consistent with the hash algorithm used when querying the stored data files. It is sufficient that the hash algorithm is the same target hash algorithm.
在本实施例中,当确定上述哈希标识后,即可基于该哈希标识确定待生成的目标文件的存储路径;作为一个可选的实施例,可以预先基于一定位数的哈希字符(0、1、2……F)组合生成若干级的文件目录存储结构,进而在该多级文件目录中确定与上述哈希标识匹配的目录位置,作为存储路径。In this embodiment, after the above-mentioned hash identifier is determined, the storage path of the target file to be generated can be determined based on the hash identifier; as an optional embodiment, it can be pre-based on a certain number of hash characters ( 0, 1, 2...F) Combine to generate several levels of file directory storage structure, and then determine the directory location matching the above hash identifier in the multi-level file directory as the storage path.
作为一个可选的实施例,上述多级文件目录的目录存储结构可以参考图2所示内容实现。图中所示的根目录可以为统一存储所有数据文件的指定根目录,如C:\data等。该根目录下的每级子目录可以由若干位哈希字符构成,例如,当采用2位哈希字符时,则根目录下可以包含16*16=256个一级子目录,依次为00、01、02……FF。在此基础上,可以设置一级子目录下不直接存储文件,而是进一步设置若干个二级子目录,例如当二级子目录也采用两位哈希字符构成时,则每个一级子目录下同理包含256个二级子目录,根目录下总计包含256*256=65536个二级子目录,后续生成的数据文件均存储在二级子目录下。其中,本实施例对于子目录的总级数和每级子目录的数量不进行限定,且通常设置两级子目录、每级子目录由两位哈希字符构成即可满足一般情况下的数据容量需求,若不能满足需求时仍可继续向第三级或以上的子目录进行扩展。As an optional embodiment, the above-mentioned directory storage structure of the multi-level file directory can be implemented with reference to the content shown in FIG. 2 . The root directory shown in the figure may be a designated root directory for uniformly storing all data files, such as C:\data. Each subdirectory under this root directory can be made of several hash characters, for example, when adopting 2 hash characters, then can contain 16*16=256 first-level subdirectories under the root directory, followed by 00, 01, 02...FF. On this basis, you can set the first-level subdirectory to not directly store files, but further set several second-level subdirectories. For example, when the second-level subdirectory is also composed of two hash characters, each first-level subdirectory Similarly, the directory contains 256 second-level subdirectories, and the root directory contains a total of 256*256=65536 second-level subdirectories. Subsequent generated data files are all stored in the second-level subdirectories. Among them, this embodiment does not limit the total number of levels of subdirectories and the number of subdirectories of each level, and usually two levels of subdirectories are set, and each level of subdirectories is composed of two hash characters to meet the data requirements under normal circumstances. Capacity requirements, if the requirements cannot be met, it can still continue to expand to the third level or above subdirectories.
作为一个具体的实施例,基于上述图2中的目录存储结构,假设对于变量标识为0001、目标时段为2021-09-10 10:00:00.000~10:59:59.999、目标周期为20ms的目标数据,与其对应的哈希标识为:HASH(“0001#2021091010#20ms”)=“006E8D78A940,”则可以确定其一级子目录为“00”(哈希标识的第1~2位),在该一级子目录下的二级子目录为“6E”(哈希标识的第3~4位),故与该目标数据对应的待生成的目标文件的存储路径可以为C:\data\00\6E;假设对于变量标识为0002、目标时段为2021全年、目标周期为1小时的目标统计数据,与其对应的哈希标识为:HASH(“0002#2021#1h”)=“009D1129E3BF,”则其存储路径同理可以为C:\data\00\9D。As a specific example, based on the above-mentioned directory storage structure in Figure 2, it is assumed that the variable identifier is 0001, the target period is 2021-09-10 10:00:00.000~10:59:59.999, and the target period is 20ms. Data, and its corresponding hash identifier is: HASH("0001#2021091010#20ms")="006E8D78A940," then it can be determined that its first-level subdirectory is "00" (1st to 2nd digits of the hash identifier). The second-level subdirectory under the first-level subdirectory is "6E" (the 3rd to 4th digits of the hash identifier), so the storage path of the target file to be generated corresponding to the target data can be C:\data\00 \6E; Assume that for the target statistical data whose variable ID is 0002, the target period is the whole year of 2021, and the target period is 1 hour, the corresponding hash ID is: HASH("0002#2021#1h")="009D1129E3BF," Then its storage path can be C:\data\00\9D similarly.
作为一个优选的实施例,当上述目标数据为基于统计确定的目标统计数据时,确定哈希标识的依据还包括该目标数据的数据类型。该数据类型对应于目标统计数据的具体统计值种类,例如当该目标统计数据的数据类型对应于最大值时,则确定数据类型为01,当对应于最小值时,则确定数据类型为02等,将该数据类型加入至上述哈希标识的计算依据中,以用于后续将同一变量、同一时段、同一统计周期下不同统计值种类的统计数据分别存入不同的数据文件中。As a preferred embodiment, when the above target data is target statistical data determined based on statistics, the basis for determining the hash identifier also includes the data type of the target data. The data type corresponds to the specific statistical value type of the target statistical data. For example, when the data type of the target statistical data corresponds to the maximum value, the data type is determined to be 01; when it corresponds to the minimum value, the data type is determined to be 02, etc. , add this data type to the calculation basis of the above-mentioned hash identifier, so as to store the statistical data of different statistical value types under the same variable, the same time period, and the same statistical cycle in different data files.
作为另一个优选的实施例,当上述目标数据为基于统计确定的目标统计数据时,也可以将同一变量、同一时段、同一统计周期下不同统计值种类的统计数据存入同一数据文件中;例如,可以将最大值、最小值、平均值数据存入同一数据文件中,该文件中的第1、4、7……3*n-2个数据点即对应于最大值,第2、5、8……3*n-1个数据点即对应于最小值,第3、6、9……3*n个数据点即对应于平均值等。其中,不同统计值种类的数据之间可以采用不同的字节长度存储,但是同一统计值种类的数据之间则必须保证字节长度相等,以确保后续进行数据查询时能够基于字节长度对指定时刻的数据进行定位。As another preferred embodiment, when the above-mentioned target data is the target statistical data determined based on statistics, the statistical data of different statistical value types under the same variable, the same time period, and the same statistical cycle can also be stored in the same data file; for example , the maximum value, minimum value, and average value data can be stored in the same data file. The 1st, 4th, 7th... 3*n-2 data points in this file correspond to the maximum value, and the 2nd, 5th, and The 8...3*n-1 data points correspond to the minimum value, the 3rd, 6th, 9th...3*n data points correspond to the average value, etc. Among them, data of different statistical value types can be stored in different byte lengths, but the data of the same statistical value type must be guaranteed to have the same byte length to ensure that subsequent data queries can be based on the specified byte length. Time data for positioning.
基于上述目录存储结构和存储路径确定方式,可以保障大量数据文件在统计上均衡地分配至每个子目录,相较于传统的基于年月日等时段建立存储目录的方式可以有效避免某个目录存储过多数据文件而导致文件访问效率的下降的缺陷。Based on the above-mentioned directory storage structure and storage path determination method, it can ensure that a large number of data files are statistically evenly distributed to each subdirectory. Too many data files lead to a decrease in file access efficiency.
需要说明的是,上述同一根目录下可以仅存储来源于同一设备的同一变量下的数据,也可以存储来源于不同设备的不同变量下的数据,本实施例对此不进行限定;若在同一根目录下仅存储特定某个变量、将不同变量的数据分入不同的根目录,即将不同变量的数据相互隔离独立存储,可以有效降低后续进行数据查询时的计算资源损耗。It should be noted that the above-mentioned same root directory can only store data under the same variable from the same device, or can store data under different variables from different devices, which is not limited in this embodiment; Only a specific variable is stored in the root directory, and the data of different variables are divided into different root directories, that is, the data of different variables are stored separately from each other, which can effectively reduce the consumption of computing resources in subsequent data queries.
以及,通过将同一变量在不同目标时段,可以大幅减小后续进行数据查询时需要进行检索的数据总量以提高查询效率;同理,将同一变量在不同目标周期下的数据独立存储,即针对秒、分、时、天等不同时间粒度进行预处理,虽然相对于仅存储原始数据的方案会小幅增加占用的存储空间总量,但是可以大幅降低查询阶段所需搜索的数据几何尺寸,获取数据局部性带来的性能提升。And, by using the same variable in different target time periods, the total amount of data that needs to be retrieved in the subsequent data query can be greatly reduced to improve query efficiency; similarly, the data of the same variable in different target periods are stored independently, that is, for Different time granularities such as seconds, minutes, hours, and days are preprocessed. Although the total amount of storage space occupied will be slightly increased compared with the solution of only storing original data, it can greatly reduce the geometric size of the data that needs to be searched in the query phase and obtain data. Performance improvements brought about by locality.
步骤103,在上述存储路径下根据上述哈希标识创建上述目标文件,将上述目标数据写入上述目标文件。Step 103: Create the above-mentioned target file according to the above-mentioned hash identifier under the above-mentioned storage path, and write the above-mentioned target data into the above-mentioned target file.
在本实施例中,基于前述步骤101、步骤102,已确定了上述目标数据、哈希标识、存储路径,故可以在上述存储路径处至少基于哈希标识创建目标文件,以用于存储上述目标数据。该目标文件的文件名称可以等同于上述哈希标识,也可以为上述哈希标识中预设位数对应的字符;文件名称中除上述哈希标识外还可以包含基于预设规则添加的额外字段等等,本实施例对此都不进行限定。In this embodiment, based on the foregoing steps 101 and 102, the above-mentioned target data, hash identifier, and storage path have been determined, so a target file can be created at least based on the hash identifier at the above-mentioned storage path for storing the above-mentioned target data. The file name of the target file can be equal to the above-mentioned hash identifier, or can be a character corresponding to the preset number of digits in the above-mentioned hash identifier; in addition to the above-mentioned hash identifier, the file name can also contain additional fields added based on preset rules etc. This embodiment does not limit this.
作为一个可选的实施例,将上述目标数据写入上述目标文件具体可以为将上述目标数据中的数据点依据时间顺序以二进制格式依次写入上述目标文件。As an optional embodiment, writing the above-mentioned target data into the above-mentioned target file may be specifically writing data points in the above-mentioned target data into the above-mentioned target file sequentially in binary format according to time sequence.
其中,目标文件可以为二进制文件,如bin(binary:二进制)格式的文件等;上述目标文件中的数据点的二进制存储位数为固定位数,若存在数据缺失的故障数据点,则基于插值算法或预设的特殊值将上述故障数据点占位为上述固定位数,以确保后续进行数据查询时能够基于字节长度进行数据点定位,例如缺失则根据相邻的未缺失数据点采用插值算法计算出在故障点位用于占位的替代数值,或者填0补位等。Among them, the target file can be a binary file, such as a file in bin (binary: binary) format; the binary storage digits of the data points in the above target file is a fixed number of digits, if there is a faulty data point with missing data, then based on interpolation Algorithms or preset special values occupy the above-mentioned faulty data points as the above-mentioned fixed number of digits to ensure that data points can be located based on byte length in subsequent data queries. For example, interpolation is used based on adjacent non-missing data points The algorithm calculates the replacement value used for the placeholder at the fault point, or fills the place with 0, etc.
作为一个可选的实施例,上述目标文件的结构可以参考如图3所示的数据文件结构示意图实现。如图3所示,每个数据文件中共包含N个按时间顺序依次排列的数据点,每个数据点以二进制存储,其存储长度可以从从1/8字节(1个比特)到8个字节(双精度浮点数)按需配置;同一个数据文件中的上述N个数据点可以一次性写入,也可以分为多次写入,本实施例对此不进行限定。As an optional embodiment, the structure of the above target file can be implemented with reference to the schematic diagram of the data file structure shown in FIG. 3 . As shown in Figure 3, each data file contains a total of N data points arranged in chronological order, each data point is stored in binary, and its storage length can range from 1/8 byte (1 bit) to 8 Bytes (double-precision floating-point numbers) are configured as needed; the above N data points in the same data file can be written at one time or divided into multiple writes, which is not limited in this embodiment.
作为一个优选的实施例,当接收到用于对上述目标变量进行数据查询的查询指令时,依据上述查询指令中携带的目标查询时段、目标查询周期、变量标识,通过上述目标哈希算法确定一个或多个待查询文件的哈希标识和存储路径,基于上述哈希标识和上述存储路径查询上述一个或多个待查询文件,获取上述目标变量在上述目标查询时段和目标查询周期下的待查询数据。As a preferred embodiment, when a query instruction for data query of the above-mentioned target variable is received, according to the target query time period, target query period, and variable identification carried in the above-mentioned query instruction, a target hash algorithm is used to determine a The hash identifiers and storage paths of one or more files to be queried, query the one or more files to be queried based on the hash identifiers and the storage paths, and obtain the to-be-queried values of the target variable in the target query period and the target query period data.
在本实施例中,可以基于接收到的查询指令对目标变量已存储的数据进行相应查询。其中,目标查询时段即为需要查询获取的目标变量相应数据所处的时间范围,目标查询周期即为需要查询获取的目标变量相应数据的采样或统计周期。In this embodiment, the data stored in the target variable may be queried based on the received query instruction. Wherein, the target query time period is the time range in which the corresponding data of the target variable needs to be queried and obtained, and the target query period is the sampling or statistics cycle of the corresponding data of the target variable to be queried and obtained.
进一步地,作为一个优选的实施例,上述数据查询过程中具体是基于与上述目标查询周期匹配的时段划分规则,依据上述查询指令中携带的上述目标查询时段确定一个或多个实际查询时段,进而基于上述一个或多个实际查询时段、上述查询指令中携带的目标查询周期、变量标识,通过目标哈希算法确定一个或多个待查询文件的哈希标识和存储路径。其中,每个实际查询时段与一个待查询文件的目标时段相匹配。Further, as a preferred embodiment, the above-mentioned data query process is specifically based on the period division rule matching the above-mentioned target query period, and one or more actual query periods are determined according to the above-mentioned target query period carried in the above-mentioned query instruction, and then Based on the above-mentioned one or more actual query periods, the target query cycle carried in the above-mentioned query instruction, and the variable identifier, the hash identifier and storage path of one or more files to be queried are determined through a target hash algorithm. Wherein, each actual query time period matches a target time period of a file to be queried.
在本实施例中,由于可能存在查询指令中携带的目标查询时段与前述目标时段并不完全相同的情况,因此需要先基于预设的时段划分规则将目标查询时段转换为一个或多个实际查询时段。In this embodiment, since the target query period carried in the query instruction may not be exactly the same as the aforementioned target period, it is necessary to first convert the target query period into one or more actual queries based on the preset period division rule time period.
例如,假设基于预设的时段划分规则,当目标变量的目标周期为20ms时,对应的数据文件中存有50*60*60=180000个数据点,即每个数据文件中的数据对应于1小时的时间长度,故上述目标变量在该目标周期下的目标时段均为10:00~11:00,11:00~12:00等整小时为单位划分的时段,此时若目标查询时段为10:30~11:30等非整点的时段,则会出现不存在与该目标查询时段相同的目标时段的问题。此时,基于预设的时段划分规则(如目标周期为20ms时对应的时段以整小时为单位划分),可以确定与目标查询时段10:30~11:30相匹配的实际查询时段为10:30~11:00,和,11:00~11:30,共两个实际查询时段;以用于后续基于这两个实际查询时段,分别确定与这两个实际查询时段匹配的待查询文件的哈希标识和存储路径,即该目标变量的目标周期为20ms、目标时段分别为10:00~11:00和11:00~12:00的两个数据存储文件;可选地,上述实际查询时段也可为10:00~11:00和11:00~12:00等,本实施例对此不进行限定。For example, assuming that based on the preset period division rule, when the target period of the target variable is 20ms, there are 50*60*60=180000 data points stored in the corresponding data file, that is, the data in each data file corresponds to 1 Hours, so the target time period of the above target variables under this target cycle is 10:00~11:00, 11:00~12:00 and other whole hour division time periods. At this time, if the target query time period is 10:30 to 11:30 and other non-hourly time periods, there will be a problem that there is no target time period that is the same as the target query time period. At this time, based on the preset period division rule (for example, when the target period is 20ms, the corresponding period is divided in whole hours), it can be determined that the actual query period matching the target query period 10:30-11:30 is 10: 30~11:00, and, 11:00~11:30, there are two actual query time periods in total; they are used to determine the files to be queried that match the two actual query time periods based on these two actual query time periods. Hash ID and storage path, that is, two data storage files with a target period of 20 ms and a target period of 10:00-11:00 and 11:00-12:00 respectively for the target variable; optionally, the above actual query The time period may also be 10:00-11:00 and 11:00-12:00, etc., which are not limited in this embodiment.
同理,上述时段划分规则中还可以为目标周期为1分种对应的时段以整天或整月为单位划分、目标周期为1小时对应的时段以整月或整年为单位划分,等等;以及,也可以为上述目标查询时段与目标时段相同,或者,上述目标查询时段被某一目标时段完全包含等情况,此时待查询文件则可以为某一个文件而非多个文件。Similarly, in the above period division rule, the period corresponding to the target period of 1 minute can be divided into whole days or whole months, and the period corresponding to the target period of 1 hour can be divided into units of whole month or year, etc. and, it may also be that the above target query time period is the same as the target time period, or that the above target query time period is completely contained by a certain target time period, etc., the file to be queried at this time may be a certain file instead of multiple files.
作为一个可选的实施例,上述文件中可以还包含用于进行数据自描述或校验的文件头信息,上述文件头信息中可以包含变量标识、目标时段、目标周期、字节长度、数据长度、加密算法类型、压缩算法类型中的一项或多项。上述文件头信息还会在至少一台独立于本设备的外部设备中作为校验信息进行存储,后续数据查询过程中即可通过判断查询获得的数据文件中用于自描述的文件头信息与外部设备中存储的校验信息是否匹配,从而确定查询获得的数据文件是否存在数据异常,起到冗余验证的效果。As an optional embodiment, the above-mentioned file may also contain file header information for data self-description or verification, and the above-mentioned file header information may include variable identification, target period, target period, byte length, data length One or more of , encryption algorithm type, and compression algorithm type. The above file header information will also be stored as verification information in at least one external device independent of the device. In the subsequent data query process, the file header information used for self-description in the data file obtained by the judgment query can be compared with the external Whether the verification information stored in the device matches, so as to determine whether there is any data anomaly in the data file obtained by query, which has the effect of redundancy verification.
具体地,上述基于哈希标识和存储路径查询上述一个或多个待查询文件后,还可以获取上述外部设备中的上述校验信息,判断上述校验信息与上述待查询文件中的上述文件头信息是否匹配,若匹配,则确定上述待查询文件不存在数据异常,获取上述目标变量在上述目标查询时段和查询周期下的待查询数据,若不匹配,则确定上述待查询文件存在数据异常,返回错误提示。Specifically, after the above-mentioned one or more files to be queried are queried based on the hash identifier and storage path, the above-mentioned verification information in the above-mentioned external device can also be obtained, and the above-mentioned verification information and the above-mentioned file header in the above-mentioned to-be-queried file can be determined. Whether the information matches, if it matches, it is determined that there is no data abnormality in the above-mentioned file to be queried, and the data to be queried under the above-mentioned target query period and query cycle of the above-mentioned target variable is obtained, if it does not match, it is determined that there is a data anomaly in the above-mentioned file to be queried, Return an error message.
其中,上述加密算法类型和压缩算法类型主要用于对写入数据后的上述数据文件进行加密和/或压缩操作,以进一步提升数据安全性,保护数据隐私。若在数据存储过程中采用了加密和/或压缩操作,则文件头信息中应当包含对应的加密算法类型和/或压缩算法类型,以便于在后续数据查询过程中,确定待查询文件后采用与存储过程中相同的加密算法类型和/或压缩算法类型对待查询文件进行解密和/或解压缩,从中读取相应的带查询数据。Among them, the above-mentioned encryption algorithm type and compression algorithm type are mainly used to encrypt and/or compress the above-mentioned data files after the data is written, so as to further improve data security and protect data privacy. If encryption and/or compression operations are used in the data storage process, the corresponding encryption algorithm type and/or compression algorithm type should be included in the file header information, so that in the subsequent data query process, after determining the file to be queried, use the same The same encryption algorithm type and/or compression algorithm type in the storage process decrypts and/or decompresses the file to be queried, and reads the corresponding query data from it.
至此,完成图1所示流程。So far, the process shown in Figure 1 is completed.
通过图1所示流程可以看出,本实施例基于与目标数据对应的目标变量的变量标识、目标周期、目标时段,采用哈希摘要算法确定对应的哈希标识和存储路径,从而在相应路径下创建对应的目标文件并写入目标数据,实现数据存储。在本实施例中,可以依据哈希算法将不同变量、不同时段、不同周期下的数据分别独立存储在不同的文件中,从而充分利用了高频数据采集和查询为等时间间隔、针对特定时段进行的特点,可以有效提高存储空间利用率,便于后续依据变量、周期、时段等信息进行有效数据比更高的数据查询,相较于传统的数据存储方式能够有效满足大数据量场景下的设备数据存储需求。It can be seen from the process shown in Figure 1 that this embodiment uses the hash digest algorithm to determine the corresponding hash identifier and storage path based on the variable identifier, target cycle, and target time period of the target variable corresponding to the target data, so that the corresponding path Create the corresponding target file and write the target data to realize data storage. In this embodiment, the data of different variables, different time periods, and different periods can be stored independently in different files according to the hash algorithm, thus making full use of the high-frequency data collection and query at equal time intervals and for specific time periods The characteristics of storage can effectively improve the utilization rate of storage space, and facilitate subsequent data query with higher effective data ratio based on variables, periods, time periods and other information. Compared with traditional data storage methods, it can effectively meet the needs of devices in large data volume scenarios. Data storage requirements.
对于基于上述方式存储获得的数据文件,可以依据“变量标识+时段+周期”的查询条件采用相同的哈希摘要算法直接确定对应的待查询文件名称和存储地址,基本可以达到O(1)级别的常量访问速度,相较于传统的全表遍历或字段索引方法可以有效提升大数据量场景下的检索速度。For the data files stored and obtained based on the above method, the same hash digest algorithm can be used to directly determine the corresponding file name and storage address to be queried according to the query conditions of "variable identification + time period + period", which can basically reach the O(1) level The constant access speed, compared with the traditional full table traversal or field index method, can effectively improve the retrieval speed in scenarios with large amounts of data.
此外,由于文件名称基于哈希标识确定,不会直接显示变量名称、采样周期、采样时段等信息,因此即使出现数据泄露情况,数据获得者也无法基于文件名称等内容反向确定出该文件中具体的数据含义,仅能得到一串无意义的二进制编码,可以有效提升数据安全性;且由于文件中相邻数据点之间对应的时间间隔固定、数据长度固定,因此可以避免保存时标、间隔符等内容,基本达到了100%的有效数据占比,进一步提升了存储空间利用率和查询效率。In addition, since the file name is determined based on the hash identifier, information such as the variable name, sampling period, and sampling period will not be directly displayed, so even if a data leak occurs, the data obtainer cannot reversely determine the content of the file based on the file name and other content. For the specific data meaning, only a series of meaningless binary codes can be obtained, which can effectively improve data security; and because the corresponding time interval and data length between adjacent data points in the file are fixed, it can avoid saving time stamps, Spacers and other content have basically reached 100% of the effective data ratio, further improving storage space utilization and query efficiency.
为了使本领域技术人员更好地理解本申请实施例提供的技术方案,下面将结合具体实施例对上述数据处理方法进行介绍。作为一个可选的实施例,假定某内蒙风电场样机A中共有1020个20ms的高频变量需要进行数据采集及存储,参考上述存储方式在指定根目录/home/agent/store下创建二级子目录结构并以4字节(单精度浮点数)的固定长度存储每个数据点,其中需要进行存储的目标数据包括目标采样周期为20ms的目标采样数据,目标统计周期分别为1秒、1分钟、1小时、1天的目标统计数据,且,目标统计数据中包含最大值、最小值、平均值、峰谷差、标准差共五种统计值,目标哈希算法采用MD5算法。In order to enable those skilled in the art to better understand the technical solutions provided by the embodiments of the present application, the above data processing method will be described below in conjunction with specific embodiments. As an optional embodiment, assume that there are 1020 high-frequency variables of 20 ms in a certain Inner Mongolia wind farm prototype A that need to be collected and stored. Refer to the above storage method to create a secondary agent under the specified root directory /home/agent/store The directory structure stores each data point with a fixed length of 4 bytes (single-precision floating point number). The target data to be stored includes the target sampling data with a target sampling period of 20ms, and the target statistical periods are 1 second and 1 minute respectively. , 1 hour, 1 day target statistical data, and the target statistical data contains five statistical values including maximum value, minimum value, average value, peak-to-valley difference, and standard deviation. The target hash algorithm uses the MD5 algorithm.
示例性地,执行数据存储的系统可以运行在在英特尔J1900低功耗CPU,2G内存,1TB机械硬盘的配置下,实际测试表明在该条件下通过前端界面对已存储的数据进行历史曲线查询时能够进行流畅无明显卡顿的显示,且所有查询含数据文件解压环节在内均可在3秒以内完成,平均内存占用约800MB,平均CPU利用率为10%以内,单日未压缩数据量约18GB,压缩后的单日数据量约3GB,实现了存储空间利用率更高、访问效率更高、硬件要求更低的数据存储方式,优于前述的传统数据库存储方式。Exemplarily, the system for performing data storage can run under the configuration of Intel J1900 low-power CPU, 2G memory, and 1TB mechanical hard disk. The actual test shows that when the historical curve query of the stored data is performed through the front-end interface under this condition It can display smoothly without obvious lag, and all queries including data file decompression can be completed within 3 seconds. The average memory usage is about 800MB, the average CPU utilization rate is less than 10%, and the amount of uncompressed data in a single day is about 800MB. 18GB, the daily data volume after compression is about 3GB, which realizes a data storage method with higher storage space utilization, higher access efficiency, and lower hardware requirements, which is better than the aforementioned traditional database storage method.
作为一个具体的实施例,假设需要对平均风速的原始数据进行存储,目标变量为1012,目标时段为2021-09-10 10:00:00(即10:00~11:00),目标周期为20ms,则哈希计算过程为:MD5(“1012#2021091010#20ms”)=A3D35596E3A9B99D99D1EF89D0493AC2,取其完整计算结果作为哈希标识,在存储路径/home/ag ent/store/A3/D3/下创建“A3D35596E3A9B99D99D1EF89D0493AC2.bin”文件作为目标文件。As a specific example, suppose it is necessary to store the raw data of the average wind speed, the target variable is 1012, the target time period is 2021-09-10 10:00:00 (that is, 10:00-11:00), and the target period is 20ms, the hash calculation process is: MD5("1012#2021091010#20ms")=A3D35596E3A9B99D99D1EF89D0493AC2, take the complete calculation result as the hash identifier, and create " A3D35596E3A9B99D99D1EF89D0493AC2.bin" file as the target file.
在该文件中每隔10秒批量写入500个20ms的原始采样数据,其字节长度固定为4字节单精度浮点数,每次写入2000字节,如遇数据缺失则填-1.0或0等特殊值作为数据缺失的标识。In this file, 500 pieces of 20ms original sampling data are written in batches every 10 seconds. The byte length is fixed at 4 bytes of single-precision floating point numbers, and 2000 bytes are written each time. If the data is missing, fill in -1.0 or Special values such as 0 are used as indicators of missing data.
其中,上述将实时数据缓存一段时间后批量写入而非持续写入,可以有效解决写入数据文件数与测点数成正比造成的同时打开文件数量超限问题。例如,若针对所有变量进行数据实时写入,则需要同时处理至少1020个文件,造成极大的硬件负荷;而设置为缓存一段时间后批量写入,则可以在一个缓存周期内依次处理各个变量对应的数据,降低需要同时进行处理的文件数量,减轻算力负载。同时通过设置写入前的缓存时长还可以在写入效率和故障损失之间进行合理平衡,由于统计数据可以从高频数据快速恢复,故只需考虑高频数据的允许损失时间,实践中还可通过保存原始数据包文件来消除批量写入导致的故障损失,避免出现故障后丢失的数据无法恢复。Among them, the above-mentioned real-time data is cached for a period of time and then written in batches instead of continuously, which can effectively solve the problem that the number of simultaneously opened files exceeds the limit caused by the number of written data files being proportional to the number of measurement points. For example, if data is written in real time for all variables, at least 1020 files need to be processed at the same time, resulting in a huge hardware load; if it is set to cache for a period of time and write in batches, each variable can be processed sequentially within one cache cycle The corresponding data reduces the number of files that need to be processed at the same time and reduces the computing power load. At the same time, by setting the cache time before writing, a reasonable balance can be achieved between writing efficiency and failure loss. Since statistical data can be quickly recovered from high-frequency data, only the allowable loss time of high-frequency data needs to be considered. The failure loss caused by batch writing can be eliminated by saving the original data package file, so that the lost data cannot be recovered after a failure occurs.
在本实施例中,上述目标文件可以设置为在数据写入过程无需进行压缩,待数据写入全部完成后后整体压缩即可,避免在数据写入过程中存在压缩-解压-压缩的反复过程,提升数据写入效率。若访问目标文件时该文件处于未压缩状态则可以直接读取;如已压缩则将其解压至预设的临时目录后读取,如基于内存的文件系统等,经过预设的超时时间后从上述临时目录中删除对应的解压文件以释放存储空间。In this embodiment, the above-mentioned target file can be set so that it does not need to be compressed during the data writing process, and can be compressed as a whole after all the data writing is completed, so as to avoid the repeated process of compression-decompression-compression during the data writing process , to improve data writing efficiency. If the file is in an uncompressed state when accessing the target file, it can be read directly; if it has been compressed, it will be decompressed to a preset temporary directory and read, such as a memory-based file system, etc., after a preset timeout period. Delete the corresponding decompressed files in the above temporary directory to free up storage space.
以及,可以通过合理控制每类数据文件的时段跨度来控制单个数据文件中的数据点总量,平衡批量存储提高存储效率和快速解压提高访问效率之间的矛盾,以保证在数据文件压缩后,查询解压耗时不显著影响用户体验;在实际使用中一般需要将上述用时控制在毫秒级,即可以将单个数据文件中的数据点数量控制在10万级左右。And, it is possible to control the total amount of data points in a single data file by reasonably controlling the period span of each type of data file, and balance the contradiction between batch storage to improve storage efficiency and fast decompression to improve access efficiency, so as to ensure that after data files are compressed, The time consumption of query decompression does not significantly affect the user experience; in actual use, the above-mentioned time generally needs to be controlled at the millisecond level, that is, the number of data points in a single data file can be controlled at about 100,000.
作为另一个具体的实施例,假设需要对平均风速的分钟级统计数据进行存储,目标变量为1012,目标时段为2021-09-10(即2021-09-10至2021-09-11),目标周期为1min,则哈希计算过程为:MD5(“1012#202109#1m”)=7C9BA434540CC928A9A9F0F7D1DD4603,取其完整计算结果作为哈希标识,在存储路径/home/agent/store/7C/9B/下创建“7C9BA434540CC928A9A9F0F7D1DD4603.bin”文件作为目标文件。As another specific example, suppose it is necessary to store the minute-level statistical data of the average wind speed, the target variable is 1012, the target period is 2021-09-10 (that is, 2021-09-10 to 2021-09-11), the target The period is 1min, then the hash calculation process is: MD5("1012#202109#1m")=7C9BA434540CC928A9A9F0F7D1DD4603, take the complete calculation result as the hash identifier, and create it under the storage path /home/agent/store/7C/9B/ The "7C9BA434540CC928A9A9F0F7D1DD4603.bin" file is used as the target file.
在该文件中每隔1小时批量写入300个统计间隔为1分钟的统计值,其中,统计值按最大值、最小值、平均值、峰谷差和标准差的固定顺序写入,每类统计值共含60个数据点,对应于该小时内的统计数据,假定所有统计值的字节长度都固定为4字节单精度浮点数,则每次写入1200字节。In this file, 300 statistical values with a statistical interval of 1 minute are written in batches every hour, where the statistical values are written in a fixed order of maximum value, minimum value, average value, peak-to-valley difference, and standard deviation, and each type The statistical value contains a total of 60 data points, which correspond to the statistical data within the hour. Assuming that the byte length of all statistical values is fixed at 4 bytes of single-precision floating-point numbers, 1200 bytes are written each time.
基于上述数据存储过程,作为一个具体的实施例,假设需要对已存储的平均风速的原始数据进行查询,待查询的变量的变量标识为1012,目标查询时段为2021-09-10 10:55~11:05之间共10分钟的高频原始采样数据,基于预设的时段划分规则可知,对应的实际查询时段应为2021-09-10 10:00:00(即10:00~11:00)和2021-09-10 11:00:00(即11:00~12:00),目标查询周期为20ms,则基于上述两个实际查询时段分别确定哈希标识:MD5(“1012#2021091010#20ms”)=A3D35596E3A9B99D99D1EF89D0493AC2,MD5(“1012#2021091011#20ms”)=DD6A61096E6E118F2B73BE7491E4D89B。Based on the above data storage process, as a specific embodiment, suppose that the stored raw data of the average wind speed needs to be queried, the variable ID of the variable to be queried is 1012, and the target query period is 2021-09-10 10:55~ A total of 10 minutes of high-frequency raw sampling data between 11:05, based on the preset time division rules, the corresponding actual query time period should be 2021-09-10 10:00:00 (that is, 10:00~11:00 ) and 2021-09-10 11:00:00 (that is, 11:00~12:00), and the target query period is 20ms, then the hash identifier is determined based on the above two actual query periods: MD5(“1012#2021091010# 20ms")=A3D35596E3A9B99D99D1EF89D0493AC2, MD5("1012#2021091011#20ms")=DD6A61096E6E118F2B73BE7491E4D89B.
基于上述哈希标识,前往/home/agent/store/A3/D3下获取文件“A3D35596E3A9B99D99D1EF89D0493AC2.bin,”解压并打开该文件,计算数据偏移量为55*60*50*4=660000字节,从第660001个字节起连续读取5*60*50*4=60000字节共15000个浮点数返回;前往/home/agent/store/DD/6A下获取文件“DD6A61096E6E118F2B73BE7491E4D89B.bin,”解压并打开该文件,无数据偏移,从第1个字节起连续读取5*60*50*4=60000字节15000个浮点数返回,合并上述30000个浮点数,即为2021-09-10 10:55~11:05之间共10分钟的高频原始采样数据,返回至用户侧。Based on the above hash ID, go to /home/agent/store/A3/D3 to get the file "A3D35596E3A9B99D99D1EF89D0493AC2.bin," decompress and open the file, calculate the data offset as 55*60*50*4=660000 bytes, Continuously read 5*60*50*4=60000 bytes from the 660001th byte and return a total of 15000 floating point numbers; go to /home/agent/store/DD/6A to get the file "DD6A61096E6E118F2B73BE7491E4D89B.bin," decompress and Open the file without data offset, read 5*60*50*4=60,000 bytes continuously from the first byte and return 15,000 floating-point numbers, combine the above 30,000 floating-point numbers to get 2021-09-10 10 minutes of high-frequency original sampling data between 10:55 and 11:05 are returned to the user side.
其中,上述数据偏移量基于字节长度和目标查询时段与实际查询时段的对应关系确定,用于指示需返回的待查询数据在待查询文件中的起始位置;当上述文件中包含文件头信息时,则数据偏移量需要对应增加与文件头信息长度相同的字节数量。Wherein, the above-mentioned data offset is determined based on the byte length and the corresponding relationship between the target query time period and the actual query time period, and is used to indicate the starting position of the data to be queried to be returned in the file to be queried; when the above-mentioned file contains a file header information, the data offset needs to be increased by the same number of bytes as the length of the file header information.
作为另一个具体的实施例,假设需要对已存储的平均风速的分钟级采样数据进行查询,且具体为前述五种统计值中的平均值数据,待查询的变量的变量标识为1012,目标查询时段为2021-09-10 10:00~12:00,共两小时,目标查询周期为1min,基于预设的时段划分规则可知,对应的实际查询时段应为2021-09(即9月~10月),则哈希计算过程为:MD5(“1012#202109#1m”)=7C9BA434540CC928A9A9F0F7D1DD4603。As another specific example, assume that the stored minute-level sampling data of the average wind speed needs to be queried, and specifically the average data of the aforementioned five statistical values, the variable ID of the variable to be queried is 1012, and the target query The time period is 2021-09-10 10:00-12:00, a total of two hours, and the target query cycle is 1 minute. Based on the preset time division rules, the corresponding actual query time period should be 2021-09 (that is, September to October months), the hash calculation process is: MD5("1012#202109#1m")=7C9BA434540CC928A9A9F0F7D1DD4603.
基于上述哈希标识,前往/home/agent/store/7C/9B下获取文件“7C9BA434540CC928A9A9F0F7D1DD4603.bin,”解压并打开该文件,计算数据偏移量为(1440*9+60*10)*4=54240字节,从第54241个字节起每5个浮点数中读取第3个浮点数,共连续读取120个浮点数,上述120个即为2021-09-10 10:00~12:00之间共两小时的目标查询周期为1min的平均值统计数据,返回至用户侧。Based on the above hash ID, go to /home/agent/store/7C/9B to obtain the file "7C9BA434540CC928A9A9F0F7D1DD4603.bin," decompress and open the file, and calculate the data offset as (1440*9+60*10)*4= 54240 bytes, read the third floating-point number from every 5 floating-point numbers starting from the 54241th byte, read 120 floating-point numbers continuously, the above 120 are 2021-09-10 10:00~12: Between 00 and 2 hours, the target query period is 1min, and the average statistical data is returned to the user side.
本实施例提供的上述数据处理方法相较于传统数据库,其数据存储性能优势可参考如图4所示的数据存储性能比对示意图,数据查询性能优势可参考如图5所示的数据查询性能比对示意图;实际测试表明,在高频数据场景下,上述方法的存储空间占用约为传统数据库的1/6~1/8,查询速度约为传统数据库的100~1000倍,有效降低了工业设备的高频数据存储空间需求,大幅提升了查询效率,进而降低了对硬件存储空间和计算速度的要求,最终达到了满足业务需求下的经济效益最优。Compared with the traditional database, the above-mentioned data processing method provided by this embodiment can refer to the data storage performance comparison schematic diagram shown in Figure 4 for its data storage performance advantages, and can refer to the data query performance shown in Figure 5 for the data query performance advantages Compare the schematic diagrams; the actual test shows that in the high-frequency data scenario, the storage space of the above method is about 1/6-1/8 of the traditional database, and the query speed is about 100-1000 times that of the traditional database, which effectively reduces the industrial cost. The high-frequency data storage space requirement of the device greatly improves the query efficiency, thereby reducing the requirements for hardware storage space and computing speed, and finally achieves the optimal economic benefit under the condition of meeting business needs.
至此,结合具体实施例完成了对上述方法流程的介绍。So far, the introduction of the process of the above method has been completed in conjunction with specific embodiments.
以上对本申请实施例提供的方法进行了描述,下面对本申请实施例提供的装置进行描述:The method provided by the embodiment of the present application is described above, and the device provided by the embodiment of the present application is described below:
参见图6,图6为本申请实施例提供的装置结构图。如图6所示,该装置可包括:Referring to FIG. 6, FIG. 6 is a structural diagram of a device provided by an embodiment of the present application. As shown in Figure 6, the device may include:
数据获取单元601,用于获取目标变量在目标时段和目标周期下的目标数据;A
哈希确定单元602,用于至少依据上述目标时段、上述目标周期、上述目标变量的变量标识,通过目标哈希算法确定目标文件的哈希标识和存储路径;A
文件创建单元603,用于在上述存储路径下根据上述哈希标识创建上述目标文件,将上述目标数据写入上述目标文件。The
在一可能的实现方式中,上述数据获取单元601中,上述获取目标变量在目标时段和目标周期下的目标数据时具体用于:In a possible implementation, in the
获取上述目标变量在目标时段和目标采样周期下的目标采样数据,和/或,获取上述目标变量在目标时段和目标统计周期下的目标统计数据;Obtain the target sampling data of the above target variable in the target period and target sampling period, and/or obtain the target statistical data of the above target variable in the target period and target statistical period;
其中,上述目标统计周期大于上述目标采样周期,上述目标统计数据为基于上述目标统计周期对采样数据或其它统计数据进行统计以获得的数据。Wherein, the above-mentioned target statistical period is greater than the above-mentioned target sampling period, and the above-mentioned target statistical data is data obtained by performing statistics on sampling data or other statistical data based on the above-mentioned target statistical period.
在一可能的实现方式中,上述哈希确定单元602中,上述至少依据上述目标时段、上述目标周期、上述目标变量的变量标识,通过目标哈希算法确定目标文件的哈希标识和存储路径时具体用于:In a possible implementation, in the
依据上述目标数据的数据类型、上述目标时段、上述目标周期、上述目标变量的变量标识,通过目标哈希算法确定目标文件的哈希标识和存储路径。According to the data type of the above-mentioned target data, the above-mentioned target time period, the above-mentioned target cycle, and the variable identification of the above-mentioned target variable, the hash identification and storage path of the target file are determined through the target hash algorithm.
在一可能的实现方式中,上述哈希确定单元602中,上述至少依据上述目标时段、上述目标周期、上述目标变量的变量标识,通过目标哈希算法确定目标文件的哈希标识和存储路径时具体用于:In a possible implementation, in the
至少依据上述目标时段、上述目标周期、上述目标变量的变量标识,通过目标哈希算法确定与上述目标数据匹配的哈希值;At least according to the above-mentioned target period, the above-mentioned target period, and the variable identification of the above-mentioned target variable, determine the hash value matching the above-mentioned target data through the target hash algorithm;
从上述哈希值中选取预设位数的字符作为上述目标文件的哈希标识;Select a character with a preset number of digits from the above hash value as the hash identifier of the above target file;
对上述哈希标识的字符进行划分,在预设的多级文件目录中查询与划分结果匹配的存储位置,确定上述目标文件的存储路径。Divide the characters identified by the above hash, query the storage location matching the division result in the preset multi-level file directory, and determine the storage path of the above target file.
在一可能的实现方式中,上述文件创建单元603中,上述将上述目标数据写入上述目标文件时具体用于:In a possible implementation manner, in the above-mentioned
将上述目标数据中的数据点依据时间顺序以二进制格式依次写入上述目标文件,其中,上述目标文件中的数据点的二进制存储位数为固定位数;The data points in the above target data are sequentially written into the above target file in binary format according to the time sequence, wherein the binary storage digits of the data points in the above target file are fixed digits;
若存在数据缺失的故障数据点,则基于插值算法或预设的特殊值将上述故障数据点占位为上述固定位数。If there is a faulty data point with missing data, the above-mentioned faulty data point is occupied by the above-mentioned fixed number of digits based on an interpolation algorithm or a preset special value.
在一可能的实现方式中,上述文件创建单元603还用于:In a possible implementation, the
当接收到用于对上述目标变量进行数据查询的查询指令时,依据上述查询指令中携带的目标查询时段、目标查询周期、变量标识,通过上述目标哈希算法确定一个或多个待查询文件的哈希标识和存储路径;When a query instruction for data query of the above-mentioned target variable is received, according to the target query period, target query period, and variable identification carried in the above-mentioned query instruction, determine the number of one or more files to be queried through the above-mentioned target hash algorithm Hash ID and storage path;
基于上述哈希标识和上述存储路径查询上述一个或多个待查询文件,获取上述目标变量在上述目标查询时段和目标查询周期下的待查询数据。Query the one or more files to be queried based on the hash identifier and the storage path, and obtain the data to be queried of the target variable in the target query period and the target query cycle.
在一可能的实现方式中,上述文件创建单元603中,上述依据上述查询指令中携带的目标查询时段、目标查询周期、变量标识,通过目标哈希算法确定一个或多个待查询文件的哈希标识和存储路径时具体用于:In a possible implementation, in the above-mentioned
基于与上述目标查询周期匹配的时段划分规则,依据上述查询指令中携带的上述目标查询时段确定一个或多个实际查询时段;One or more actual query periods are determined according to the above target query period carried in the above query instruction based on the period division rule matching the above target query period;
基于上述一个或多个实际查询时段、上述查询指令中携带的目标查询周期、变量标识,通过目标哈希算法确定一个或多个待查询文件的哈希标识和存储路径;其中,每个实际查询时段与一个待查询文件的目标时段相匹配。Based on the above-mentioned one or more actual query periods, the target query cycle carried in the above-mentioned query instructions, and the variable identification, the hash identification and storage path of one or more files to be queried are determined through the target hash algorithm; wherein, each actual query The period matches the target period of a file to be queried.
在一可能的实现方式中,上述目标文件和上述待查询文件中还包含用于进行数据自描述的文件头信息,上述文件头信息中包含变量标识、目标时段、目标周期、字节长度、加密算法类型、压缩算法类型中的至少一项;外部设备中存有用于对上述目标文件进行数据校验的校验信息;In a possible implementation, the above-mentioned target file and the above-mentioned file to be queried also include file header information for data self-description, and the above-mentioned file header information includes variable identification, target time period, target period, byte length, encryption At least one of algorithm type and compression algorithm type; verification information for data verification of the above-mentioned target files is stored in the external device;
上述文件创建单元603中,上述基于上述存储路径和上述哈希标识查询上述一个或多个待查询文件后,具体还用于:In the above-mentioned
获取上述外部设备中的上述校验信息,判断上述校验信息与上述待查询文件中的上述文件头信息是否匹配;Obtaining the above verification information in the above external device, and judging whether the above verification information matches the above file header information in the above file to be queried;
若匹配,则确定上述待查询文件不存在数据异常,获取上述目标变量在上述目标查询时段和查询周期下的待查询数据;If it matches, it is determined that there is no data abnormality in the above-mentioned file to be queried, and the data to be queried under the above-mentioned target query period and query cycle of the above-mentioned target variable is obtained;
若不匹配,则确定上述待查询文件存在数据异常,返回错误提示。If they do not match, it is determined that there is a data exception in the above-mentioned file to be queried, and an error message is returned.
至此,完成图6所示装置的结构描述。So far, the structural description of the device shown in FIG. 6 is completed.
本申请实施例还提供了图6所示装置的硬件结构。参见图7,图7为本申请实施例提供的电子设备结构图。如图7所示,该硬件结构可包括:处理器和机器可读存储介质,机器可读存储介质存储有能够被上述处理器执行的机器可执行指令;上述处理器用于执行机器可执行指令,以实现本申请上述示例公开的方法。The embodiment of the present application also provides the hardware structure of the device shown in FIG. 6 . Referring to FIG. 7, FIG. 7 is a structural diagram of an electronic device provided by an embodiment of the present application. As shown in FIG. 7, the hardware structure may include: a processor and a machine-readable storage medium, where the machine-readable storage medium stores machine-executable instructions that can be executed by the processor; the processor is used to execute the machine-executable instructions, In order to realize the method disclosed in the above example of the present application.
基于与上述方法同样的申请构思,本申请实施例还提供一种机器可读存储介质,上述机器可读存储介质上存储有若干计算机指令,上述计算机指令被处理器执行时,能够实现本申请上述示例公开的方法。Based on the same application concept as the above-mentioned method, the embodiment of the present application also provides a machine-readable storage medium. The above-mentioned machine-readable storage medium stores a number of computer instructions. When the above-mentioned computer instructions are executed by a processor, the above-mentioned Methods exposed by the example.
示例性的,上述机器可读存储介质可以是任何电子、磁性、光学或其它物理存储装置,可以包含或存储信息,如可执行指令、数据,等等。例如,机器可读存储介质可以是:RAM(Radom Access Memory,随机存取存储器)、易失存储器、非易失性存储器、闪存、存储驱动器(如硬盘驱动器)、固态硬盘、任何类型的存储盘(如光盘、dvd等),或者类似的存储介质,或者它们的组合。Exemplarily, the above-mentioned machine-readable storage medium may be any electronic, magnetic, optical or other physical storage device, which may contain or store information, such as executable instructions, data, and so on. For example, the machine-readable storage medium can be: RAM (Radom Access Memory, random access memory), volatile memory, non-volatile memory, flash memory, storage drive (such as hard disk drive), solid-state hard disk, any type of storage disk (such as CD, DVD, etc.), or similar storage media, or a combination of them.
上述实施例阐明的系统、装置、模块或单元,具体可以由计算机芯片或实体实现,或者由具有某种功能的产品来实现。一种典型的实现设备为计算机,计算机的具体形式可以是个人计算机、膝上型计算机、蜂窝电话、相机电话、智能电话、个人数字助理、媒体播放器、导航设备、电子邮件收发设备、游戏控制台、平板计算机、可穿戴设备或者这些设备中的任意几种设备的组合。The systems, devices, modules, or units described in the above embodiments can be specifically implemented by computer chips or entities, or by products with certain functions. A typical implementing device is a computer, which may take the form of a personal computer, laptop computer, cellular phone, camera phone, smart phone, personal digital assistant, media player, navigation device, e-mail device, game control device, etc. desktops, tablets, wearables, or any combination of these.
为了描述的方便,描述以上装置时以功能分为各种单元分别描述。当然,在实施本申请时可以把各单元的功能在同一个或多个软件和/或硬件中实现。For the convenience of description, when describing the above devices, functions are divided into various units and described separately. Of course, when implementing the present application, the functions of each unit can be implemented in one or more pieces of software and/or hardware.
本领域内的技术人员应明白,本申请的实施例可提供为方法、系统、或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请实施例可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art should understand that the embodiments of the present application may be provided as methods, systems, or computer program products. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可以由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其它可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其它可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present application is described with reference to flowcharts and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the present application. It should be understood that each procedure and/or block in the flowchart and/or block diagram, and a combination of procedures and/or blocks in the flowchart and/or block diagram can be realized by computer program instructions. These computer program instructions may be provided to a general purpose computer, special purpose computer, embedded processor, or processor of other programmable data processing equipment to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing equipment produce a An apparatus for realizing the functions specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.
而且,这些计算机程序指令也可以存储在能引导计算机或其它可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或者多个流程和/或方框图一个方框或者多个方框中指定的功能。Moreover, these computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing device to operate in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture comprising instruction means, The instruction means implements the functions specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.
这些计算机程序指令也可装载到计算机或其它可编程数据处理设备上,使得在计算机或者其它可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其它可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operational steps are performed on the computer or other programmable equipment to produce computer-implemented processing, so that the information executed on the computer or other programmable equipment The instructions provide steps for implementing the functions specified in the flow chart or blocks of the flowchart and/or the block or blocks of the block diagrams.
以上所述仅为本申请的实施例而已,并不用于限制本申请。对于本领域技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原理之内所作的任何修改、等同替换、改进等,均应包含在本申请的权利要求范围之内。The above descriptions are only examples of the present application, and are not intended to limit the present application. For those skilled in the art, various modifications and changes may occur in this application. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application shall be included within the scope of the claims of the present application.
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211021925.5A CN115408390A (en) | 2022-08-24 | 2022-08-24 | A data processing method, device and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211021925.5A CN115408390A (en) | 2022-08-24 | 2022-08-24 | A data processing method, device and electronic equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115408390A true CN115408390A (en) | 2022-11-29 |
Family
ID=84160690
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211021925.5A Pending CN115408390A (en) | 2022-08-24 | 2022-08-24 | A data processing method, device and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115408390A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116932470A (en) * | 2023-09-18 | 2023-10-24 | 江苏正泰泰杰赛智能科技有限公司 | Method, system and storage medium capable of calculating and storing time sequence data of Internet of things |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104881481A (en) * | 2015-06-03 | 2015-09-02 | 安科智慧城市技术(中国)有限公司 | Method and device for accessing mass time sequence data |
CN111209252A (en) * | 2018-11-22 | 2020-05-29 | 杭州海康威视系统技术有限公司 | File metadata storage method and device and electronic equipment |
CN111309677A (en) * | 2020-02-11 | 2020-06-19 | 西安奥卡云数据科技有限公司 | File management method and device of distributed file system |
WO2022088807A1 (en) * | 2020-10-30 | 2022-05-05 | 深圳壹账通智能科技有限公司 | Distributed file storage method and system based on blockchain, and server and client |
WO2022143540A1 (en) * | 2020-12-31 | 2022-07-07 | 杭州趣链科技有限公司 | Block chain index storage method and apparatus, computer device and medium |
-
2022
- 2022-08-24 CN CN202211021925.5A patent/CN115408390A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104881481A (en) * | 2015-06-03 | 2015-09-02 | 安科智慧城市技术(中国)有限公司 | Method and device for accessing mass time sequence data |
CN111209252A (en) * | 2018-11-22 | 2020-05-29 | 杭州海康威视系统技术有限公司 | File metadata storage method and device and electronic equipment |
CN111309677A (en) * | 2020-02-11 | 2020-06-19 | 西安奥卡云数据科技有限公司 | File management method and device of distributed file system |
WO2022088807A1 (en) * | 2020-10-30 | 2022-05-05 | 深圳壹账通智能科技有限公司 | Distributed file storage method and system based on blockchain, and server and client |
WO2022143540A1 (en) * | 2020-12-31 | 2022-07-07 | 杭州趣链科技有限公司 | Block chain index storage method and apparatus, computer device and medium |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116932470A (en) * | 2023-09-18 | 2023-10-24 | 江苏正泰泰杰赛智能科技有限公司 | Method, system and storage medium capable of calculating and storing time sequence data of Internet of things |
CN116932470B (en) * | 2023-09-18 | 2024-01-05 | 江苏正泰泰杰赛智能科技有限公司 | Method, system and storage medium capable of calculating and storing time sequence data of Internet of things |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111125089B (en) | Time sequence data storage method, device, server and storage medium | |
US10176208B2 (en) | Processing time series data from multiple sensors | |
US8631052B1 (en) | Efficient content meta-data collection and trace generation from deduplicated storage | |
JP7524440B2 (en) | Apparatus for storing data to be stored | |
CN104751055B (en) | A kind of distributed malicious code detecting method, apparatus and system based on texture | |
CN106682077B (en) | Mass time sequence data storage implementation method based on Hadoop technology | |
CN107766529B (en) | Mass data storage method for sewage treatment industry | |
US8667032B1 (en) | Efficient content meta-data collection and trace generation from deduplicated storage | |
WO2021226922A1 (en) | Data compression method, apparatus and device, and readable storage medium | |
CN107656971A (en) | A kind of intelligent grid collection Monitoring Data storage method based on Redis | |
CN102497450B (en) | Two-stage-system-based distributed data compression processing method | |
CN112632568A (en) | Temperature data storage and acquisition method, system, electronic equipment and storage medium | |
CN115408390A (en) | A data processing method, device and electronic equipment | |
CN104731716B (en) | A kind of date storage method | |
CN119415035A (en) | Data storage method, device and storage medium for power Internet of Things | |
US12229083B2 (en) | Long term and short term data management of a file based time series database populated with data collected by an energy sensor for a power generating device or from another data source | |
CN110399396B (en) | Efficient data processing | |
CN108647243B (en) | Industrial big data storage method based on time series | |
EP4124967A1 (en) | A method for adaptive data storage optimization | |
CN115061637A (en) | Disk data indexing method and device, computer equipment and storage medium | |
JP7404734B2 (en) | Data compression device, history information management system, data compression method and data compression program | |
CN113626439A (en) | A data processing method, device, data processing equipment and storage medium | |
CN114691675A (en) | A data acquisition and storage system based on big data | |
CN113220775A (en) | Method for writing mass power measurement data into Hbase in batch | |
Wang et al. | Research on storage and retrieval method of mass data for high-speed train |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB03 | Change of inventor or designer information |
Inventor after: Zhou Jinghui Inventor after: Cheng Dong Inventor after: Ma Hanzheng Inventor before: Zhou Jinghui Inventor before: Cheng Dong Inventor before: Ma Hanzheng |
|
CB03 | Change of inventor or designer information |