CN112486948B - Real-time data processing method - Google Patents
Real-time data processing method Download PDFInfo
- Publication number
- CN112486948B CN112486948B CN202011337050.0A CN202011337050A CN112486948B CN 112486948 B CN112486948 B CN 112486948B CN 202011337050 A CN202011337050 A CN 202011337050A CN 112486948 B CN112486948 B CN 112486948B
- Authority
- CN
- China
- Prior art keywords
- data
- cache
- time
- real
- period
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1446—Point-in-time backing up or restoration of persistent data
- G06F11/1458—Management of the backup or restore process
- G06F11/1464—Management of the backup or restore process for networked environments
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/23—Updating
- G06F16/2379—Updates performed during online database operations; commit processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24552—Database cache management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Quality & Reliability (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a real-time data processing method, which adopts the hierarchical division of multi-level cache, and takes the period set by data acquisition as the basis of the hierarchical division of the cache for the data acquired by equipment; each level cache combines persistent storage and real-time storage, so that the real-time performance and the stability are improved; the data is cached by adopting a task queue mode, the maximum caching period of the data can be set, and the data exceeding the maximum caching period is deleted, so that the functions of data caching and data management are realized.
Description
Technical Field
The invention relates to the technical field of industrial automation control, in particular to a real-time data processing method.
Background
With the rapid development of industrial internet technology, the amount of data generated by network edge devices is rapidly increasing, and the data includes operation data, device status data, production process data, quality inspection data, and the like collected from various products and production devices such as numerical control machines, PLCs, industrial robots, and the like. The industrial internet needs to be fast and stable for data uploaded to the cloud, is suitable for various network application environments, realizes seamless switching in various communication networks, and can avoid the problems of data packet loss, data distortion and the like. How to effectively avoid the problem that the application performance of the industrial internet is influenced by blocking a data interface at the cloud end after a large amount of industrial data is accessed to the industrial internet is a major problem which must be solved by the industrial internet.
Disclosure of Invention
The invention aims to provide a real-time data processing method which is used for relieving the problem of huge pressure on network bandwidth caused by the fact that a large amount of data generated by edge equipment in real time are transmitted to a cloud end in an industrial internet.
The technical scheme adopted by the invention is as follows:
a real-time data processing method, comprising the steps of:
step 1, acquiring a system configuration file to check whether a data backup period is reached; if so, adding the data backup task to the end of the memory task queue; otherwise, waiting for the backup period to arrive;
step 2, executing data backup tasks from the head of the task queue in sequence, and acquiring data acquisition cycles of all data nodes corresponding to the current data backup task;
step 3, respectively constructing corresponding cache levels according to different data acquisition periods, and classifying data points of the same data acquisition period into the same cache level;
step 4, each cache level acquires data to be stored in real time, and caches the data to an internal memory as a direct data source according to a corresponding data structure;
step 5, writing the data cached in the memory into the hard disk file in a snapshot mode for persistent storage,
step 6, acquiring the maximum cache period of the database allowed by the data backup task, and judging whether the interval between the database time and the current time exceeds the maximum cache period; if so, deleting the data file beyond the time range and storing the data file into a new data file; otherwise, storing the new data file into a database;
step 7, uniformly packaging all the data which are subjected to cache backup by each cache level according to the time characteristic and uploading the data to an upper node;
step 8, the superior node checks whether the acquired data is lost; if so, notifying each cache level to perform data compensation; otherwise, finishing the data backup and collection of the current node.
Further, in step 4, the disk data is periodically cached to the real-time cache region by adopting a timing loading mode aiming at the key data; and refreshing the data to the common cache region according to a long period by adopting a supplementary mode aiming at the common data.
Further, in step 5, the snapshot persistence of the data is automatically performed according to the set policy.
Further, the policy is set to perform snapshot persistence of data up to 100 data updates every 60 seconds.
Further, in the step 8, during data compensation, the stored data of each cache hierarchy is used as a data source to perform data compensation in an asynchronous message manner.
Further, the data compensation comprises the following specific steps:
8-1, the superior node informs the cloud end of opening a data compensation channel;
8-2, the superior node sends a compensation instruction to the subordinate cache level;
8-3, the lower-level cache level indexes downwards according to the source path of the lost data to determine the node where the data is located;
and 8-4, acquiring the lost data from the buffer area or the database by the node where the lost data is located, repacking and uploading the data to complete data compensation.
The invention adopts the technical scheme, and has the following characteristics: 1) the method adopts the hierarchical division of the multi-level cache, and takes the period set by the data acquisition as the basis of the hierarchical division of the cache for the data acquired by the equipment; (2) each level cache combines persistent storage and real-time storage, so that the real-time performance and the stability are improved; (3) the data is cached by adopting a task queue mode, the maximum caching period of the data can be set, and the data exceeding the maximum caching period is deleted, so that the functions of data caching and data management are realized.
Drawings
The invention is described in further detail below with reference to the accompanying drawings and the detailed description;
fig. 1 is a schematic flow chart of a real-time data processing method according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application.
The invention provides a real-time data processing method which is used for relieving huge pressure brought to network bandwidth by transmitting a large amount of data generated by edge equipment in real time to a cloud end in an industrial internet, so that the difference between the data acquisition period of an edge layer and the real-time performance of large data processing of a platform layer is balanced, and the integrity and timeliness of the data are ensured. The principle of the multi-level caching technology is similar to that of a container terminal, a data storage yard can be respectively established in the edge side and the cloud database to serve as a caching area of data, and caching processing is respectively carried out on the data on the edge side and the data uploaded to the cloud, so that the multi-level caching technology can be called as a data terminal technology. The data terminal comprises two major data cache types of real-time data cache and historical data cache. The real-time data cache can cache real-time data to a memory according to a set data structure and serve as a most direct calculation data source. The historical data caching is to store data in a database according to a certain data format and store the data in a hard disk in a file form, so that the data can be traced and compensated under the condition of data loss, and the stability and reliability of a data transmission process can be improved.
As shown in fig. 1, the present invention discloses a real-time data processing method, which comprises the following steps:
step 1, acquiring a system configuration file to check whether a data backup period is reached; if so, adding the data backup task to the end of the memory task queue; otherwise, waiting for the backup period to arrive;
step 2, executing data backup tasks from the head of the task queue in sequence, and acquiring data acquisition cycles of all data nodes corresponding to the current data backup task;
step 3, respectively constructing corresponding cache levels according to different data acquisition periods, and classifying data points of the same data acquisition period into the same cache level;
step 4, each cache level acquires data to be stored in real time, and caches the data to the memory as a direct data source according to a corresponding data structure;
step 5, writing the data cached in the memory into the hard disk file in a snapshot manner for persistent storage,
step 6, acquiring the maximum cache period of the database allowed by the data backup task, and judging whether the interval between the database time and the current time exceeds the maximum cache period; if so, deleting the data file beyond the time range and storing the data file into a new data file; otherwise, storing the new data file into a database;
step 7, uniformly packaging all the data which are subjected to cache backup by each cache level according to the time characteristic and uploading the data to an upper node;
step 8, the superior node checks whether the acquired data is lost; if so, notifying each cache level to perform data compensation; otherwise, finishing the data backup and collection of the current node.
Further, in step 4, the disk data is periodically cached to the real-time cache region by adopting a timing loading mode aiming at the key data; and refreshing the data to the common cache region according to a long period by adopting a supplementary mode aiming at the common data.
Further, in step 5, the snapshot persistence of the data is automatically performed according to the set policy.
Further, the policy is set to perform snapshot persistence of data up to 100 data updates every 60 seconds.
Further, in the step 8, during data compensation, the stored data of each cache hierarchy is used as a data source to perform data compensation in an asynchronous message manner.
Further, the data compensation comprises the following specific steps:
8-1, the superior node informs the cloud end of opening a data compensation channel;
8-2, the superior node sends a compensation instruction to the subordinate cache level;
8-3, the lower-level cache level indexes downwards according to the source path of the lost data to determine the node where the data is located;
and 8-4, acquiring the lost data from the buffer area or the database by the node where the lost data is located, repacking and uploading the data to complete data compensation.
The following detailed description of the specific principles of the present invention is provided, which has the following features:
1) the cache function mainly exists in each computing node of the industrial internet, and the node is responsible for receiving data of lower nodes, summarizing and classifying the data, caching the data in a corresponding hierarchy and forwarding the data to the upper nodes.
2) And the cache hierarchy is divided. The data acquisition period is an important basis for dividing the cache hierarchy, and data points in the same data acquisition period are planned to the same hierarchy for caching. The advantage of dividing like this is can be unified packing, unified upload to the data that the same time required, has reduced the time consumption on the screening of data when different time data are mixed together. And the mode of dividing by the time characteristic can more conveniently carry out unified calculation and processing of data.
3) The caching modes in the various hierarchies include real-time storage and persistent storage. The real-time storage is to cache data to a memory according to a certain data structure, and the data is used as the most direct data source and is provided for persistent storage and other services. The persistent storage stores data to a database according to a certain data structure and stores the data to a hard disk in a file form, so that the persistent storage of the data is realized. The purpose of persistent storage is to prevent data loss and to enable data tracing and compensation when data loss occurs. And (4) adopting a snapshot mode to carry out persistence, namely writing the data cached in the memory into the binary file in the snapshot mode. And (3) allowing the snapshot to be automatically persisted by adopting certain configured policy settings, and initiating snapshot saving and the like when 100 data are updated every 60 s. In order to improve IO efficiency in the storage process and avoid data accumulation, the storage stability is improved by combining persistent storage with a real-time cache technology.
4) The data storage mode of each hierarchy combines the advantages and disadvantages of persistent storage and real-time storage, and combines the advantages and disadvantages as the storage mode of each hierarchy. Since the time of the storage process and the storage amount of data cannot be increased infinitely, besides the regular management on the storage amount, the performance optimization of the storage process is crucial. Real-time storage to persistent storage and data flow between persistent storage and real-time storage are greatly affected by IO efficiency, so that a data cache mode needs to be designed to reduce the influence in this aspect.
A. The process from real-time storage to persistent storage comprises the following steps: the method can improve the storage efficiency and ensure the reliability of the caching process by reasonably designing the data structure, applying the characteristics of the database, optimizing database statements, processing data exception and the like.
B. The process from the persistent storage to the real-time storage comprises the following steps: the caching mechanism may be different for different data. And a timing loading mode is adopted for caching the key data, and the disk data is cached to a real-time buffer area at regular intervals, so that the real-time performance of the data is strong. For normal data. And the data is refreshed to the buffer area according to the period (the period is larger) by adopting a supplementary mode, so that the real-time performance of the data is poor.
5) Data compensation is needed between each level of the multi-level cache when data loss occurs. Therefore, the data stored in each hierarchy by the storage pattern is retransmitted as a data source when the data is compensated. And when the data are lost, a data compensation channel of the cloud end is opened, a compensation instruction is sent to a lower node, the lower node indexes downwards according to a data source path to find a node where the data are located, and the lost data are uploaded again. In the data supplement process, a cache mechanism and a tree-shaped index mechanism are utilized, so that the index efficiency and the data compensation capability are greatly improved.
6) In order to ensure that 100% of data can be stored in the database and the time point of each data recording is accurate, the invention adopts a design scheme of a task queue to separate database operation from task recording. Namely, the process of data- > memory- > hard disk is realized, so that a small amount of memory is occupied, but the cached data is more accurate and stable.
8) In order to ensure the real-time performance of the multi-level cache, the management of the multi-level cache data is also an important work of the invention, and the database compares the current time and deletes the data beyond the time range according to the maximum cache period provided in the configuration information.
The invention adopts the technical scheme, and has the following characteristics: 1) the method adopts the hierarchical division of the multi-level cache, and takes the period set by the data acquisition as the basis of the hierarchical division of the cache for the data acquired by the equipment; (2) each level cache combines persistent storage and real-time storage, so that the real-time performance and the stability are improved; (3) the data is cached in a task queue mode, the maximum caching period of the data can be set, and the data exceeding the maximum caching period is deleted, so that the functions of data caching and data management are realized.
It is to be understood that the embodiments described are only a few embodiments of the present application and not all embodiments. The embodiments and features of the embodiments in the present application may be combined with each other without conflict. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the detailed description of the embodiments of the present application is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Claims (5)
1. A real-time data processing method, characterized by: which comprises the following steps:
step 1, acquiring a system configuration file to check whether a data backup period is reached; if yes, adding the data backup task to the tail of the memory task queue; otherwise, waiting for the backup period to arrive;
step 2, executing data backup tasks from the head of the task queue in sequence, and acquiring data acquisition cycles of all data nodes corresponding to the current data backup task;
step 3, respectively constructing corresponding cache levels according to different data acquisition periods, and classifying data points of the same data acquisition period into the same cache level;
step 4, each cache level acquires data to be stored in real time, and caches the data to the memory as a direct data source according to a corresponding data structure;
step 5, writing the data cached in the memory into the hard disk file in a snapshot mode for persistent storage,
step 6, acquiring the maximum cache period of the database allowed by the data backup task, and judging whether the interval between the database time and the current time exceeds the maximum cache period; if so, deleting the data file beyond the time range and storing the data file into a new data file; otherwise, storing the new data file into a database;
step 7, uniformly packaging all the data which are subjected to cache backup by each cache level according to the time characteristic and uploading the data to an upper node;
step 8, the superior node checks whether the acquired data is lost; if so, notifying each cache level to perform data compensation; otherwise, finishing the data backup and acquisition of the current node; the data compensation method comprises the following specific steps:
8-1, the superior node informs the cloud end of opening a data compensation channel;
8-2, the superior node sends a compensation instruction to the subordinate cache level;
8-3, the lower-level cache level indexes downwards according to the source path of the lost data to determine the node where the data is located;
and 8-4, acquiring the lost data from the buffer area or the database by the node where the lost data is located, and packaging and uploading the lost data to finish data compensation.
2. A real-time data processing method according to claim 1, characterized in that: step 4, periodically caching the disk data to a real-time cache region by adopting a timing loading mode aiming at the key data; and refreshing the data to the common cache region according to a long period by adopting a supplementary mode aiming at the common data.
3. A real-time data processing method according to claim 1, characterized in that: and 5, automatically performing snapshot persistence of the data according to a set strategy.
4. A real-time data processing method according to claim 3, characterized in that: the policy is set to perform snapshot persistence of data up to 100 data updates every 60 seconds.
5. A real-time data processing method according to claim 1, characterized in that: and 8, during data compensation, using the stored data of each cache level as a data source to perform data compensation in an asynchronous message mode.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011337050.0A CN112486948B (en) | 2020-11-25 | 2020-11-25 | Real-time data processing method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011337050.0A CN112486948B (en) | 2020-11-25 | 2020-11-25 | Real-time data processing method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112486948A CN112486948A (en) | 2021-03-12 |
CN112486948B true CN112486948B (en) | 2022-05-13 |
Family
ID=74934533
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011337050.0A Active CN112486948B (en) | 2020-11-25 | 2020-11-25 | Real-time data processing method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112486948B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113742381B (en) * | 2021-08-30 | 2023-07-25 | 欧电云信息科技(江苏)有限公司 | Cache acquisition method, device and computer readable medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN206559396U (en) * | 2016-11-23 | 2017-10-13 | 成都阜特科技股份有限公司 | A kind of industrial real-time data system |
CN109800260A (en) * | 2018-12-14 | 2019-05-24 | 深圳壹账通智能科技有限公司 | High concurrent date storage method, device, computer equipment and storage medium |
CN111291083A (en) * | 2020-01-22 | 2020-06-16 | 奇安信科技集团股份有限公司 | Webpage source code data processing method and device and computer equipment |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130205088A1 (en) * | 2012-02-06 | 2013-08-08 | International Business Machines Corporation | Multi-stage cache directory and variable cache-line size for tiered storage architectures |
-
2020
- 2020-11-25 CN CN202011337050.0A patent/CN112486948B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN206559396U (en) * | 2016-11-23 | 2017-10-13 | 成都阜特科技股份有限公司 | A kind of industrial real-time data system |
CN109800260A (en) * | 2018-12-14 | 2019-05-24 | 深圳壹账通智能科技有限公司 | High concurrent date storage method, device, computer equipment and storage medium |
CN111291083A (en) * | 2020-01-22 | 2020-06-16 | 奇安信科技集团股份有限公司 | Webpage source code data processing method and device and computer equipment |
Non-Patent Citations (1)
Title |
---|
基于多级缓存的海量感知数据检索优化的研究;张建静;《中国优秀博硕士学位论文全文数据库(硕士)》;20140915;全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN112486948A (en) | 2021-03-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109951463A (en) | A kind of Internet of Things big data analysis method stored based on stream calculation and novel column | |
CN108833503A (en) | A kind of Redis cluster method based on ZooKeeper | |
CN103440244A (en) | Large-data storage and optimization method | |
CN106357463B (en) | The access link tracking implementation method and its system of non-invasive | |
CN104679772A (en) | Method, device, equipment and system for deleting files in distributed data warehouse | |
CN103246616A (en) | Global shared cache replacement method for realizing long-short cycle access frequency | |
CN107800808A (en) | A kind of data-storage system based on Hadoop framework | |
CN109299056B (en) | A kind of method of data synchronization and device based on distributed file system | |
CN107368608A (en) | The HDFS small documents buffer memory management methods of algorithm are replaced based on ARC | |
CN104734915A (en) | Composite multiprocess and multithread multi-network concurrence dynamic simulation method | |
CN112486948B (en) | Real-time data processing method | |
CN112118283B (en) | Data processing method and system based on multi-level cache | |
CN111159176A (en) | Method and system for storing and reading mass stream data | |
CN109710668A (en) | A kind of multi-source heterogeneous data access middleware construction method | |
CN107180082A (en) | A kind of data update system and method based on multi-level buffer mechanism | |
CN102497450A (en) | Two-stage-system-based distributed data compression processing method | |
US20240045869A1 (en) | A method and device of data transmission | |
CN102098170B (en) | Data acquisition optimization method and system | |
CN105407044A (en) | Method for implementing cloud storage gateway system based on network file system (NFS) | |
CN106790705A (en) | A kind of Distributed Application local cache realizes system and implementation method | |
CN105760398A (en) | Log recording system and log record operating method | |
US20220391411A1 (en) | Dynamic adaptive partition splitting | |
CN101102176A (en) | A data backup method | |
CN104156327A (en) | Method for recognizing object power failure in write back mode in distributed file system | |
CN116805940A (en) | Data acquisition system and method based on extensible edge calculation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CB03 | Change of inventor or designer information | ||
CB03 | Change of inventor or designer information |
Inventor after: Chen Xiang Inventor after: Chen Junken Inventor after: Chen Hui Inventor before: Chen Xiang Inventor before: Chen Junken Inventor before: Chen Hui |