CN112486948B

CN112486948B - Real-time data processing method

Info

Publication number: CN112486948B
Application number: CN202011337050.0A
Authority: CN
Inventors: 陈湘; 陈鋆垠; 陈辉
Original assignee: Fujian Digital Fujian Cloud Computing Operation Co ltd
Current assignee: Fujian Digital Fujian Cloud Computing Operation Co ltd
Priority date: 2020-11-25
Filing date: 2020-11-25
Publication date: 2022-05-13
Anticipated expiration: 2040-11-25
Also published as: CN112486948A

Abstract

The invention discloses a real-time data processing method, which adopts the hierarchical division of multi-level cache, and takes the period set by data acquisition as the basis of the hierarchical division of the cache for the data acquired by equipment; each level cache combines persistent storage and real-time storage, so that the real-time performance and the stability are improved; the data is cached by adopting a task queue mode, the maximum caching period of the data can be set, and the data exceeding the maximum caching period is deleted, so that the functions of data caching and data management are realized.

Description

Real-time data processing method

Technical Field

The invention relates to the technical field of industrial automation control, in particular to a real-time data processing method.

Background

With the rapid development of industrial internet technology, the amount of data generated by network edge devices is rapidly increasing, and the data includes operation data, device status data, production process data, quality inspection data, and the like collected from various products and production devices such as numerical control machines, PLCs, industrial robots, and the like. The industrial internet needs to be fast and stable for data uploaded to the cloud, is suitable for various network application environments, realizes seamless switching in various communication networks, and can avoid the problems of data packet loss, data distortion and the like. How to effectively avoid the problem that the application performance of the industrial internet is influenced by blocking a data interface at the cloud end after a large amount of industrial data is accessed to the industrial internet is a major problem which must be solved by the industrial internet.

Disclosure of Invention

The invention aims to provide a real-time data processing method which is used for relieving the problem of huge pressure on network bandwidth caused by the fact that a large amount of data generated by edge equipment in real time are transmitted to a cloud end in an industrial internet.

The technical scheme adopted by the invention is as follows:

a real-time data processing method, comprising the steps of:

step 1, acquiring a system configuration file to check whether a data backup period is reached; if so, adding the data backup task to the end of the memory task queue; otherwise, waiting for the backup period to arrive;

step 2, executing data backup tasks from the head of the task queue in sequence, and acquiring data acquisition cycles of all data nodes corresponding to the current data backup task;

step 3, respectively constructing corresponding cache levels according to different data acquisition periods, and classifying data points of the same data acquisition period into the same cache level;

step 4, each cache level acquires data to be stored in real time, and caches the data to an internal memory as a direct data source according to a corresponding data structure;

step 5, writing the data cached in the memory into the hard disk file in a snapshot mode for persistent storage,

step 6, acquiring the maximum cache period of the database allowed by the data backup task, and judging whether the interval between the database time and the current time exceeds the maximum cache period; if so, deleting the data file beyond the time range and storing the data file into a new data file; otherwise, storing the new data file into a database;

step 7, uniformly packaging all the data which are subjected to cache backup by each cache level according to the time characteristic and uploading the data to an upper node;

step 8, the superior node checks whether the acquired data is lost; if so, notifying each cache level to perform data compensation; otherwise, finishing the data backup and collection of the current node.

Further, in step 4, the disk data is periodically cached to the real-time cache region by adopting a timing loading mode aiming at the key data; and refreshing the data to the common cache region according to a long period by adopting a supplementary mode aiming at the common data.

Further, in step 5, the snapshot persistence of the data is automatically performed according to the set policy.

Further, the policy is set to perform snapshot persistence of data up to 100 data updates every 60 seconds.

Further, in the step 8, during data compensation, the stored data of each cache hierarchy is used as a data source to perform data compensation in an asynchronous message manner.

Further, the data compensation comprises the following specific steps:

8-1, the superior node informs the cloud end of opening a data compensation channel;

8-2, the superior node sends a compensation instruction to the subordinate cache level;

8-3, the lower-level cache level indexes downwards according to the source path of the lost data to determine the node where the data is located;

and 8-4, acquiring the lost data from the buffer area or the database by the node where the lost data is located, repacking and uploading the data to complete data compensation.

The invention adopts the technical scheme, and has the following characteristics: 1) the method adopts the hierarchical division of the multi-level cache, and takes the period set by the data acquisition as the basis of the hierarchical division of the cache for the data acquired by the equipment; (2) each level cache combines persistent storage and real-time storage, so that the real-time performance and the stability are improved; (3) the data is cached by adopting a task queue mode, the maximum caching period of the data can be set, and the data exceeding the maximum caching period is deleted, so that the functions of data caching and data management are realized.

Drawings

The invention is described in further detail below with reference to the accompanying drawings and the detailed description;

fig. 1 is a schematic flow chart of a real-time data processing method according to the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application.

The invention provides a real-time data processing method which is used for relieving huge pressure brought to network bandwidth by transmitting a large amount of data generated by edge equipment in real time to a cloud end in an industrial internet, so that the difference between the data acquisition period of an edge layer and the real-time performance of large data processing of a platform layer is balanced, and the integrity and timeliness of the data are ensured. The principle of the multi-level caching technology is similar to that of a container terminal, a data storage yard can be respectively established in the edge side and the cloud database to serve as a caching area of data, and caching processing is respectively carried out on the data on the edge side and the data uploaded to the cloud, so that the multi-level caching technology can be called as a data terminal technology. The data terminal comprises two major data cache types of real-time data cache and historical data cache. The real-time data cache can cache real-time data to a memory according to a set data structure and serve as a most direct calculation data source. The historical data caching is to store data in a database according to a certain data format and store the data in a hard disk in a file form, so that the data can be traced and compensated under the condition of data loss, and the stability and reliability of a data transmission process can be improved.

As shown in fig. 1, the present invention discloses a real-time data processing method, which comprises the following steps:

step 4, each cache level acquires data to be stored in real time, and caches the data to the memory as a direct data source according to a corresponding data structure;

step 5, writing the data cached in the memory into the hard disk file in a snapshot manner for persistent storage,

Further, the data compensation comprises the following specific steps:

The following detailed description of the specific principles of the present invention is provided, which has the following features:

1) the cache function mainly exists in each computing node of the industrial internet, and the node is responsible for receiving data of lower nodes, summarizing and classifying the data, caching the data in a corresponding hierarchy and forwarding the data to the upper nodes.

2) And the cache hierarchy is divided. The data acquisition period is an important basis for dividing the cache hierarchy, and data points in the same data acquisition period are planned to the same hierarchy for caching. The advantage of dividing like this is can be unified packing, unified upload to the data that the same time required, has reduced the time consumption on the screening of data when different time data are mixed together. And the mode of dividing by the time characteristic can more conveniently carry out unified calculation and processing of data.

3) The caching modes in the various hierarchies include real-time storage and persistent storage. The real-time storage is to cache data to a memory according to a certain data structure, and the data is used as the most direct data source and is provided for persistent storage and other services. The persistent storage stores data to a database according to a certain data structure and stores the data to a hard disk in a file form, so that the persistent storage of the data is realized. The purpose of persistent storage is to prevent data loss and to enable data tracing and compensation when data loss occurs. And (4) adopting a snapshot mode to carry out persistence, namely writing the data cached in the memory into the binary file in the snapshot mode. And (3) allowing the snapshot to be automatically persisted by adopting certain configured policy settings, and initiating snapshot saving and the like when 100 data are updated every 60 s. In order to improve IO efficiency in the storage process and avoid data accumulation, the storage stability is improved by combining persistent storage with a real-time cache technology.

4) The data storage mode of each hierarchy combines the advantages and disadvantages of persistent storage and real-time storage, and combines the advantages and disadvantages as the storage mode of each hierarchy. Since the time of the storage process and the storage amount of data cannot be increased infinitely, besides the regular management on the storage amount, the performance optimization of the storage process is crucial. Real-time storage to persistent storage and data flow between persistent storage and real-time storage are greatly affected by IO efficiency, so that a data cache mode needs to be designed to reduce the influence in this aspect.

A. The process from real-time storage to persistent storage comprises the following steps: the method can improve the storage efficiency and ensure the reliability of the caching process by reasonably designing the data structure, applying the characteristics of the database, optimizing database statements, processing data exception and the like.

B. The process from the persistent storage to the real-time storage comprises the following steps: the caching mechanism may be different for different data. And a timing loading mode is adopted for caching the key data, and the disk data is cached to a real-time buffer area at regular intervals, so that the real-time performance of the data is strong. For normal data. And the data is refreshed to the buffer area according to the period (the period is larger) by adopting a supplementary mode, so that the real-time performance of the data is poor.

5) Data compensation is needed between each level of the multi-level cache when data loss occurs. Therefore, the data stored in each hierarchy by the storage pattern is retransmitted as a data source when the data is compensated. And when the data are lost, a data compensation channel of the cloud end is opened, a compensation instruction is sent to a lower node, the lower node indexes downwards according to a data source path to find a node where the data are located, and the lost data are uploaded again. In the data supplement process, a cache mechanism and a tree-shaped index mechanism are utilized, so that the index efficiency and the data compensation capability are greatly improved.

6) In order to ensure that 100% of data can be stored in the database and the time point of each data recording is accurate, the invention adopts a design scheme of a task queue to separate database operation from task recording. Namely, the process of data- > memory- > hard disk is realized, so that a small amount of memory is occupied, but the cached data is more accurate and stable.

8) In order to ensure the real-time performance of the multi-level cache, the management of the multi-level cache data is also an important work of the invention, and the database compares the current time and deletes the data beyond the time range according to the maximum cache period provided in the configuration information.

The invention adopts the technical scheme, and has the following characteristics: 1) the method adopts the hierarchical division of the multi-level cache, and takes the period set by the data acquisition as the basis of the hierarchical division of the cache for the data acquired by the equipment; (2) each level cache combines persistent storage and real-time storage, so that the real-time performance and the stability are improved; (3) the data is cached in a task queue mode, the maximum caching period of the data can be set, and the data exceeding the maximum caching period is deleted, so that the functions of data caching and data management are realized.

It is to be understood that the embodiments described are only a few embodiments of the present application and not all embodiments. The embodiments and features of the embodiments in the present application may be combined with each other without conflict. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the detailed description of the embodiments of the present application is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Claims

1. A real-time data processing method, characterized by: which comprises the following steps:

step 1, acquiring a system configuration file to check whether a data backup period is reached; if yes, adding the data backup task to the tail of the memory task queue; otherwise, waiting for the backup period to arrive;

step 8, the superior node checks whether the acquired data is lost; if so, notifying each cache level to perform data compensation; otherwise, finishing the data backup and acquisition of the current node; the data compensation method comprises the following specific steps:

and 8-4, acquiring the lost data from the buffer area or the database by the node where the lost data is located, and packaging and uploading the lost data to finish data compensation.

2. A real-time data processing method according to claim 1, characterized in that: step 4, periodically caching the disk data to a real-time cache region by adopting a timing loading mode aiming at the key data; and refreshing the data to the common cache region according to a long period by adopting a supplementary mode aiming at the common data.

3. A real-time data processing method according to claim 1, characterized in that: and 5, automatically performing snapshot persistence of the data according to a set strategy.

4. A real-time data processing method according to claim 3, characterized in that: the policy is set to perform snapshot persistence of data up to 100 data updates every 60 seconds.

5. A real-time data processing method according to claim 1, characterized in that: and 8, during data compensation, using the stored data of each cache level as a data source to perform data compensation in an asynchronous message mode.