CN112486948B - Real-time data processing method - Google Patents

Real-time data processing method Download PDF

Info

Publication number
CN112486948B
CN112486948B CN202011337050.0A CN202011337050A CN112486948B CN 112486948 B CN112486948 B CN 112486948B CN 202011337050 A CN202011337050 A CN 202011337050A CN 112486948 B CN112486948 B CN 112486948B
Authority
CN
China
Prior art keywords
data
cache
time
real
period
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011337050.0A
Other languages
Chinese (zh)
Other versions
CN112486948A (en
Inventor
陈湘
陈鋆垠
陈辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujian Digital Fujian Cloud Computing Operation Co ltd
Original Assignee
Fujian Digital Fujian Cloud Computing Operation Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujian Digital Fujian Cloud Computing Operation Co ltd filed Critical Fujian Digital Fujian Cloud Computing Operation Co ltd
Priority to CN202011337050.0A priority Critical patent/CN112486948B/en
Publication of CN112486948A publication Critical patent/CN112486948A/en
Application granted granted Critical
Publication of CN112486948B publication Critical patent/CN112486948B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1464Management of the backup or restore process for networked environments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2379Updates performed during online database operations; commit processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24552Database cache management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Quality & Reliability (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a real-time data processing method, which adopts the hierarchical division of multi-level cache, and takes the period set by data acquisition as the basis of the hierarchical division of the cache for the data acquired by equipment; each level cache combines persistent storage and real-time storage, so that the real-time performance and the stability are improved; the data is cached by adopting a task queue mode, the maximum caching period of the data can be set, and the data exceeding the maximum caching period is deleted, so that the functions of data caching and data management are realized.

Description

Real-time data processing method
Technical Field
The invention relates to the technical field of industrial automation control, in particular to a real-time data processing method.
Background
With the rapid development of industrial internet technology, the amount of data generated by network edge devices is rapidly increasing, and the data includes operation data, device status data, production process data, quality inspection data, and the like collected from various products and production devices such as numerical control machines, PLCs, industrial robots, and the like. The industrial internet needs to be fast and stable for data uploaded to the cloud, is suitable for various network application environments, realizes seamless switching in various communication networks, and can avoid the problems of data packet loss, data distortion and the like. How to effectively avoid the problem that the application performance of the industrial internet is influenced by blocking a data interface at the cloud end after a large amount of industrial data is accessed to the industrial internet is a major problem which must be solved by the industrial internet.
Disclosure of Invention
The invention aims to provide a real-time data processing method which is used for relieving the problem of huge pressure on network bandwidth caused by the fact that a large amount of data generated by edge equipment in real time are transmitted to a cloud end in an industrial internet.
The technical scheme adopted by the invention is as follows:
a real-time data processing method, comprising the steps of:
step 1, acquiring a system configuration file to check whether a data backup period is reached; if so, adding the data backup task to the end of the memory task queue; otherwise, waiting for the backup period to arrive;
step 2, executing data backup tasks from the head of the task queue in sequence, and acquiring data acquisition cycles of all data nodes corresponding to the current data backup task;
step 3, respectively constructing corresponding cache levels according to different data acquisition periods, and classifying data points of the same data acquisition period into the same cache level;
step 4, each cache level acquires data to be stored in real time, and caches the data to an internal memory as a direct data source according to a corresponding data structure;
step 5, writing the data cached in the memory into the hard disk file in a snapshot mode for persistent storage,
step 6, acquiring the maximum cache period of the database allowed by the data backup task, and judging whether the interval between the database time and the current time exceeds the maximum cache period; if so, deleting the data file beyond the time range and storing the data file into a new data file; otherwise, storing the new data file into a database;
step 7, uniformly packaging all the data which are subjected to cache backup by each cache level according to the time characteristic and uploading the data to an upper node;
step 8, the superior node checks whether the acquired data is lost; if so, notifying each cache level to perform data compensation; otherwise, finishing the data backup and collection of the current node.
Further, in step 4, the disk data is periodically cached to the real-time cache region by adopting a timing loading mode aiming at the key data; and refreshing the data to the common cache region according to a long period by adopting a supplementary mode aiming at the common data.
Further, in step 5, the snapshot persistence of the data is automatically performed according to the set policy.
Further, the policy is set to perform snapshot persistence of data up to 100 data updates every 60 seconds.
Further, in the step 8, during data compensation, the stored data of each cache hierarchy is used as a data source to perform data compensation in an asynchronous message manner.
Further, the data compensation comprises the following specific steps:
8-1, the superior node informs the cloud end of opening a data compensation channel;
8-2, the superior node sends a compensation instruction to the subordinate cache level;
8-3, the lower-level cache level indexes downwards according to the source path of the lost data to determine the node where the data is located;
and 8-4, acquiring the lost data from the buffer area or the database by the node where the lost data is located, repacking and uploading the data to complete data compensation.
The invention adopts the technical scheme, and has the following characteristics: 1) the method adopts the hierarchical division of the multi-level cache, and takes the period set by the data acquisition as the basis of the hierarchical division of the cache for the data acquired by the equipment; (2) each level cache combines persistent storage and real-time storage, so that the real-time performance and the stability are improved; (3) the data is cached by adopting a task queue mode, the maximum caching period of the data can be set, and the data exceeding the maximum caching period is deleted, so that the functions of data caching and data management are realized.
Drawings
The invention is described in further detail below with reference to the accompanying drawings and the detailed description;
fig. 1 is a schematic flow chart of a real-time data processing method according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application.
The invention provides a real-time data processing method which is used for relieving huge pressure brought to network bandwidth by transmitting a large amount of data generated by edge equipment in real time to a cloud end in an industrial internet, so that the difference between the data acquisition period of an edge layer and the real-time performance of large data processing of a platform layer is balanced, and the integrity and timeliness of the data are ensured. The principle of the multi-level caching technology is similar to that of a container terminal, a data storage yard can be respectively established in the edge side and the cloud database to serve as a caching area of data, and caching processing is respectively carried out on the data on the edge side and the data uploaded to the cloud, so that the multi-level caching technology can be called as a data terminal technology. The data terminal comprises two major data cache types of real-time data cache and historical data cache. The real-time data cache can cache real-time data to a memory according to a set data structure and serve as a most direct calculation data source. The historical data caching is to store data in a database according to a certain data format and store the data in a hard disk in a file form, so that the data can be traced and compensated under the condition of data loss, and the stability and reliability of a data transmission process can be improved.
As shown in fig. 1, the present invention discloses a real-time data processing method, which comprises the following steps:
step 1, acquiring a system configuration file to check whether a data backup period is reached; if so, adding the data backup task to the end of the memory task queue; otherwise, waiting for the backup period to arrive;
step 2, executing data backup tasks from the head of the task queue in sequence, and acquiring data acquisition cycles of all data nodes corresponding to the current data backup task;
step 3, respectively constructing corresponding cache levels according to different data acquisition periods, and classifying data points of the same data acquisition period into the same cache level;
step 4, each cache level acquires data to be stored in real time, and caches the data to the memory as a direct data source according to a corresponding data structure;
step 5, writing the data cached in the memory into the hard disk file in a snapshot manner for persistent storage,
step 6, acquiring the maximum cache period of the database allowed by the data backup task, and judging whether the interval between the database time and the current time exceeds the maximum cache period; if so, deleting the data file beyond the time range and storing the data file into a new data file; otherwise, storing the new data file into a database;
step 7, uniformly packaging all the data which are subjected to cache backup by each cache level according to the time characteristic and uploading the data to an upper node;
step 8, the superior node checks whether the acquired data is lost; if so, notifying each cache level to perform data compensation; otherwise, finishing the data backup and collection of the current node.
Further, in step 4, the disk data is periodically cached to the real-time cache region by adopting a timing loading mode aiming at the key data; and refreshing the data to the common cache region according to a long period by adopting a supplementary mode aiming at the common data.
Further, in step 5, the snapshot persistence of the data is automatically performed according to the set policy.
Further, the policy is set to perform snapshot persistence of data up to 100 data updates every 60 seconds.
Further, in the step 8, during data compensation, the stored data of each cache hierarchy is used as a data source to perform data compensation in an asynchronous message manner.
Further, the data compensation comprises the following specific steps:
8-1, the superior node informs the cloud end of opening a data compensation channel;
8-2, the superior node sends a compensation instruction to the subordinate cache level;
8-3, the lower-level cache level indexes downwards according to the source path of the lost data to determine the node where the data is located;
and 8-4, acquiring the lost data from the buffer area or the database by the node where the lost data is located, repacking and uploading the data to complete data compensation.
The following detailed description of the specific principles of the present invention is provided, which has the following features:
1) the cache function mainly exists in each computing node of the industrial internet, and the node is responsible for receiving data of lower nodes, summarizing and classifying the data, caching the data in a corresponding hierarchy and forwarding the data to the upper nodes.
2) And the cache hierarchy is divided. The data acquisition period is an important basis for dividing the cache hierarchy, and data points in the same data acquisition period are planned to the same hierarchy for caching. The advantage of dividing like this is can be unified packing, unified upload to the data that the same time required, has reduced the time consumption on the screening of data when different time data are mixed together. And the mode of dividing by the time characteristic can more conveniently carry out unified calculation and processing of data.
3) The caching modes in the various hierarchies include real-time storage and persistent storage. The real-time storage is to cache data to a memory according to a certain data structure, and the data is used as the most direct data source and is provided for persistent storage and other services. The persistent storage stores data to a database according to a certain data structure and stores the data to a hard disk in a file form, so that the persistent storage of the data is realized. The purpose of persistent storage is to prevent data loss and to enable data tracing and compensation when data loss occurs. And (4) adopting a snapshot mode to carry out persistence, namely writing the data cached in the memory into the binary file in the snapshot mode. And (3) allowing the snapshot to be automatically persisted by adopting certain configured policy settings, and initiating snapshot saving and the like when 100 data are updated every 60 s. In order to improve IO efficiency in the storage process and avoid data accumulation, the storage stability is improved by combining persistent storage with a real-time cache technology.
4) The data storage mode of each hierarchy combines the advantages and disadvantages of persistent storage and real-time storage, and combines the advantages and disadvantages as the storage mode of each hierarchy. Since the time of the storage process and the storage amount of data cannot be increased infinitely, besides the regular management on the storage amount, the performance optimization of the storage process is crucial. Real-time storage to persistent storage and data flow between persistent storage and real-time storage are greatly affected by IO efficiency, so that a data cache mode needs to be designed to reduce the influence in this aspect.
A. The process from real-time storage to persistent storage comprises the following steps: the method can improve the storage efficiency and ensure the reliability of the caching process by reasonably designing the data structure, applying the characteristics of the database, optimizing database statements, processing data exception and the like.
B. The process from the persistent storage to the real-time storage comprises the following steps: the caching mechanism may be different for different data. And a timing loading mode is adopted for caching the key data, and the disk data is cached to a real-time buffer area at regular intervals, so that the real-time performance of the data is strong. For normal data. And the data is refreshed to the buffer area according to the period (the period is larger) by adopting a supplementary mode, so that the real-time performance of the data is poor.
5) Data compensation is needed between each level of the multi-level cache when data loss occurs. Therefore, the data stored in each hierarchy by the storage pattern is retransmitted as a data source when the data is compensated. And when the data are lost, a data compensation channel of the cloud end is opened, a compensation instruction is sent to a lower node, the lower node indexes downwards according to a data source path to find a node where the data are located, and the lost data are uploaded again. In the data supplement process, a cache mechanism and a tree-shaped index mechanism are utilized, so that the index efficiency and the data compensation capability are greatly improved.
6) In order to ensure that 100% of data can be stored in the database and the time point of each data recording is accurate, the invention adopts a design scheme of a task queue to separate database operation from task recording. Namely, the process of data- > memory- > hard disk is realized, so that a small amount of memory is occupied, but the cached data is more accurate and stable.
8) In order to ensure the real-time performance of the multi-level cache, the management of the multi-level cache data is also an important work of the invention, and the database compares the current time and deletes the data beyond the time range according to the maximum cache period provided in the configuration information.
The invention adopts the technical scheme, and has the following characteristics: 1) the method adopts the hierarchical division of the multi-level cache, and takes the period set by the data acquisition as the basis of the hierarchical division of the cache for the data acquired by the equipment; (2) each level cache combines persistent storage and real-time storage, so that the real-time performance and the stability are improved; (3) the data is cached in a task queue mode, the maximum caching period of the data can be set, and the data exceeding the maximum caching period is deleted, so that the functions of data caching and data management are realized.
It is to be understood that the embodiments described are only a few embodiments of the present application and not all embodiments. The embodiments and features of the embodiments in the present application may be combined with each other without conflict. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the detailed description of the embodiments of the present application is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Claims (5)

1. A real-time data processing method, characterized by: which comprises the following steps:
step 1, acquiring a system configuration file to check whether a data backup period is reached; if yes, adding the data backup task to the tail of the memory task queue; otherwise, waiting for the backup period to arrive;
step 2, executing data backup tasks from the head of the task queue in sequence, and acquiring data acquisition cycles of all data nodes corresponding to the current data backup task;
step 3, respectively constructing corresponding cache levels according to different data acquisition periods, and classifying data points of the same data acquisition period into the same cache level;
step 4, each cache level acquires data to be stored in real time, and caches the data to the memory as a direct data source according to a corresponding data structure;
step 5, writing the data cached in the memory into the hard disk file in a snapshot mode for persistent storage,
step 6, acquiring the maximum cache period of the database allowed by the data backup task, and judging whether the interval between the database time and the current time exceeds the maximum cache period; if so, deleting the data file beyond the time range and storing the data file into a new data file; otherwise, storing the new data file into a database;
step 7, uniformly packaging all the data which are subjected to cache backup by each cache level according to the time characteristic and uploading the data to an upper node;
step 8, the superior node checks whether the acquired data is lost; if so, notifying each cache level to perform data compensation; otherwise, finishing the data backup and acquisition of the current node; the data compensation method comprises the following specific steps:
8-1, the superior node informs the cloud end of opening a data compensation channel;
8-2, the superior node sends a compensation instruction to the subordinate cache level;
8-3, the lower-level cache level indexes downwards according to the source path of the lost data to determine the node where the data is located;
and 8-4, acquiring the lost data from the buffer area or the database by the node where the lost data is located, and packaging and uploading the lost data to finish data compensation.
2. A real-time data processing method according to claim 1, characterized in that: step 4, periodically caching the disk data to a real-time cache region by adopting a timing loading mode aiming at the key data; and refreshing the data to the common cache region according to a long period by adopting a supplementary mode aiming at the common data.
3. A real-time data processing method according to claim 1, characterized in that: and 5, automatically performing snapshot persistence of the data according to a set strategy.
4. A real-time data processing method according to claim 3, characterized in that: the policy is set to perform snapshot persistence of data up to 100 data updates every 60 seconds.
5. A real-time data processing method according to claim 1, characterized in that: and 8, during data compensation, using the stored data of each cache level as a data source to perform data compensation in an asynchronous message mode.
CN202011337050.0A 2020-11-25 2020-11-25 Real-time data processing method Active CN112486948B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011337050.0A CN112486948B (en) 2020-11-25 2020-11-25 Real-time data processing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011337050.0A CN112486948B (en) 2020-11-25 2020-11-25 Real-time data processing method

Publications (2)

Publication Number Publication Date
CN112486948A CN112486948A (en) 2021-03-12
CN112486948B true CN112486948B (en) 2022-05-13

Family

ID=74934533

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011337050.0A Active CN112486948B (en) 2020-11-25 2020-11-25 Real-time data processing method

Country Status (1)

Country Link
CN (1) CN112486948B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113742381B (en) * 2021-08-30 2023-07-25 欧电云信息科技(江苏)有限公司 Cache acquisition method, device and computer readable medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN206559396U (en) * 2016-11-23 2017-10-13 成都阜特科技股份有限公司 A kind of industrial real-time data system
CN109800260A (en) * 2018-12-14 2019-05-24 深圳壹账通智能科技有限公司 High concurrent date storage method, device, computer equipment and storage medium
CN111291083A (en) * 2020-01-22 2020-06-16 奇安信科技集团股份有限公司 Webpage source code data processing method and device and computer equipment

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130205088A1 (en) * 2012-02-06 2013-08-08 International Business Machines Corporation Multi-stage cache directory and variable cache-line size for tiered storage architectures

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN206559396U (en) * 2016-11-23 2017-10-13 成都阜特科技股份有限公司 A kind of industrial real-time data system
CN109800260A (en) * 2018-12-14 2019-05-24 深圳壹账通智能科技有限公司 High concurrent date storage method, device, computer equipment and storage medium
CN111291083A (en) * 2020-01-22 2020-06-16 奇安信科技集团股份有限公司 Webpage source code data processing method and device and computer equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于多级缓存的海量感知数据检索优化的研究;张建静;《中国优秀博硕士学位论文全文数据库(硕士)》;20140915;全文 *

Also Published As

Publication number Publication date
CN112486948A (en) 2021-03-12

Similar Documents

Publication Publication Date Title
CN109951463A (en) A kind of Internet of Things big data analysis method stored based on stream calculation and novel column
CN108833503A (en) A kind of Redis cluster method based on ZooKeeper
CN103440244A (en) Large-data storage and optimization method
CN106357463B (en) The access link tracking implementation method and its system of non-invasive
CN104679772A (en) Method, device, equipment and system for deleting files in distributed data warehouse
CN103246616A (en) Global shared cache replacement method for realizing long-short cycle access frequency
CN107800808A (en) A kind of data-storage system based on Hadoop framework
CN109299056B (en) A kind of method of data synchronization and device based on distributed file system
CN107368608A (en) The HDFS small documents buffer memory management methods of algorithm are replaced based on ARC
CN104734915A (en) Composite multiprocess and multithread multi-network concurrence dynamic simulation method
CN112486948B (en) Real-time data processing method
CN112118283B (en) Data processing method and system based on multi-level cache
CN111159176A (en) Method and system for storing and reading mass stream data
CN109710668A (en) A kind of multi-source heterogeneous data access middleware construction method
CN107180082A (en) A kind of data update system and method based on multi-level buffer mechanism
CN102497450A (en) Two-stage-system-based distributed data compression processing method
US20240045869A1 (en) A method and device of data transmission
CN102098170B (en) Data acquisition optimization method and system
CN105407044A (en) Method for implementing cloud storage gateway system based on network file system (NFS)
CN106790705A (en) A kind of Distributed Application local cache realizes system and implementation method
CN105760398A (en) Log recording system and log record operating method
US20220391411A1 (en) Dynamic adaptive partition splitting
CN101102176A (en) A data backup method
CN104156327A (en) Method for recognizing object power failure in write back mode in distributed file system
CN116805940A (en) Data acquisition system and method based on extensible edge calculation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CB03 Change of inventor or designer information
CB03 Change of inventor or designer information

Inventor after: Chen Xiang

Inventor after: Chen Junken

Inventor after: Chen Hui

Inventor before: Chen Xiang

Inventor before: Chen Junken

Inventor before: Chen Hui