CN117992498A

CN117992498A - Method and system for efficiently managing time sequence data of Internet of things based on memory mapping

Info

Publication number: CN117992498A
Application number: CN202410123738.0A
Authority: CN
Inventors: 张炜刚; 贾德星
Original assignee: Shanghai Yunxi Technology Co ltd
Current assignee: Shanghai Yunxi Technology Co ltd
Priority date: 2024-01-30
Filing date: 2024-01-30
Publication date: 2024-05-07

Abstract

The invention discloses a method and a system for efficiently managing time sequence data of the Internet of things based on memory mapping, which relate to the technical field of time sequence data, and comprise the following steps: storing a time sequence data table in a column storage format, wherein the fields of the table comprise fixed-length fields and variable-length fields, the product of the number of bytes occupied by the fixed-length fields and the total number of rows of the time sequence data table is calculated to obtain the total number of bytes occupied by the data in the column of the fixed-length fields in the table, and the total number of bytes occupied by the data in the column of the variable-length fields in the table is obtained through an s file and a data column file; writing column data of the time sequence data table into a memory mapping file through a mmap function or a remap function; and a metadata fixed-length management strategy is designed by managing a set number of devices through a time sequence data table so as to quickly locate data addresses on time sequence files and realize quick reading and writing of time sequence data of different devices. The invention can directly read and write page buffer memory in the user space, and solves the problem of inconvenient memory mapping management of variable-length files.

Description

Method and system for efficiently managing time sequence data of Internet of things based on memory mapping

Technical Field

The invention relates to the technical field of time sequence data, in particular to an Internet of things time sequence data efficient management method and system based on memory mapping.

Background

In recent years, the internet of things has been rapidly developed. In the typical scene of the internet of things such as the internet of things, equipment monitoring, network monitoring, express tracking and the like, mass monitoring data, track data and sensor data can be generated. These data are typically time series data, which are generated frequently, are large in data volume, and are heavily dependent on acquisition time. Conventional databases are not capable of handling such high-writing massive amounts of real-time data, and require the storage and analysis of such data using a time-series database capable of supporting a time-series model.

Currently, the main stream of time sequence database read-write files generally has three steps: reading the file content into the memory, modifying the content in the memory, and writing the data of the memory into the file. The page cache (PAGE CACHE) acts as an intermediate layer to modify and read speed up the file. When the database reads the file, in order to reduce disk IO, the file is read to the page buffer memory so that the next reading can be directly accessed from the memory, and the reading speed is improved. Limited by the deployment machine memory size, this page cache may have a size limitation. Beyond this cache size, the database application will perform page elimination, i.e., delete the cache with low access frequency, and load the newly accessed file content. If the file corresponding to the page buffer memory is modified, the page buffer memory needs to be dropped in time, so that the consistency of the buffer memory and the file is ensured.

The synchronization of memory and files described above results in application complexity, and conventional file operations require two copies of data from disk to page cache and to user host. Based on the above, the memory zero-copy method is provided, the memory mapping (MMap) technology is utilized to directly read and write the page buffer in the user space, and the process of copying the data of the page buffer to the user space buffer is omitted, so that higher read and write efficiency is achieved, a programming model is simplified, and development efficiency and system stability are improved.

Disclosure of Invention

Aiming at the needs and the shortcomings of the prior art development, the invention provides a method and a system for efficiently managing time sequence data of the Internet of things based on memory mapping, which are used for directly mapping files to a virtual memory to read and write time sequence data without page buffering so as to improve the read and write performance, avoid the copying of data from a kernel space to a user space and achieve the highest theoretical read and write speed.

In a first aspect, the present invention provides a method for efficiently managing time sequence data of an internet of things based on memory mapping, which solves the above technical problems by adopting the following technical scheme:

a method for efficiently managing time sequence data of the Internet of things based on memory mapping is realized, which comprises the following steps:

The method comprises the steps of storing a time sequence data table in a column storage format, wherein fields of the time sequence data table comprise fixed-length fields and variable-length fields, the product of the number of bytes occupied by the fixed-length fields and the total number of rows of the time sequence data table is calculated to obtain the total number of bytes occupied by data in a column of the time sequence data table, and the total number of bytes occupied by the data in the column of the variable-length fields in the time sequence data table is obtained through an s file and a data column file;

Writing column data of the time sequence data table into a memory mapping file through a mmap function or a remap function;

And a metadata fixed-length management strategy is designed by managing a set number of devices through a time sequence data table so as to quickly locate data addresses on time sequence files and realize quick reading and writing of time sequence data of different devices.

Optionally, for the fixed-length field, each field is stored by an independent file, and is read and written by a memory mapping MMap mode;

For reading the fixed-length field, directly multiplying the number of bytes occupied by the fixed-length field by the total line number of the time sequence data table to obtain the address on the MMap file, and further converting the address into an actual data type to obtain access.

Optionally, the column data of the time sequence data table is written into the memory mapping file through a mmap function or a remap function, and the process specifically comprises the following steps:

Firstly, calculating total byte number of column data to be written, judging whether the idle space of memory mapping file is enough to write the column data,

If yes, obtaining the head address of the memory mapping file through the mmap function, and permanently recording the used size of the memory mapping file in the file head,

If not, the memory mapping file is expanded through the remap function, then the writing head address is searched according to the reading flow, and the data is written into the corresponding address space, so that the writing operation can be completed.

Further alternatively, when column data of the time sequence data table is written into the memory mapping file through the remap function, a reservation strategy is designed, a set space is reserved in multiple or one time according to the percentage, after the set space meets the requirement of the writing space, the writing head address is searched according to the reading flow, and the data is written into the corresponding address space, so that the writing operation is completed.

Optionally, after writing the column data of the time sequence data table into the memory mapping file, the memory mapping file is synchronized to the disk file in real time, or forcedly written into the disk file through the msync function at regular time.

In a second aspect, the present invention provides an internet of things time sequence data efficient management system based on memory mapping, which solves the technical problems as follows:

An internet of things time sequence data efficient management system based on memory mapping, which comprises:

A format setting module for storing a time sequence data table in a column memory format, wherein the fields of the time sequence data table comprise fixed length fields and variable length fields, the product of the number of bytes occupied by the fixed length fields and the total number of rows of the time sequence data table is calculated to obtain the total number of bytes occupied by the data of the column of the fixed length fields in the time sequence data table, the s file records the length of variable-length character strings of actual data and the content thereof, and the data column file records the offset of the s file, wherein the total byte number occupied by the data in the column of the variable-length field is the recorded data of the last row of the data column file;

The data writing module is used for writing column data of the time sequence data table into the memory mapping file through a mmap function or a remap function;

And the fixed-length management module is used for setting the number of the time sequence data table management devices, and designing a metadata fixed-length management strategy to rapidly locate the data addresses on the time sequence files so as to realize rapid reading and writing of time sequence data of different devices.

Further alternatively, for the fixed-length field, each field is stored by a separate file, and is read and written by means of a memory map MMap;

Optionally, the related data writing module specifically includes:

the acquisition computation sub-module is used for acquiring column data to be written in and computing the total byte number of the column data;

A comparison judging sub-module for judging whether the free space of the memory mapping file is enough for writing the column data,

A fixed length writing sub-module for obtaining the head address of the memory mapping file through mmap function when the free space of the memory mapping file is enough to write the column data, and permanently recording the used size of the memory mapping file in the file head,

And the variable length writing sub-module is used for expanding the capacity of the memory mapping file through a remap function when the free space of the memory mapping file is insufficient for writing the column of data, then searching a writing head address according to a reading flow, and writing the data into a corresponding address space, thereby completing the writing operation.

Further optionally, when the related variable length writing submodule writes column data of the time sequence data table into the memory mapping file through the remap function, a reservation strategy is designed in advance to reserve the set space at one time in multiple or according to the percentage, after the set space meets the requirement of the writing space, the writing head address is searched according to the reading flow, and the data is written into the corresponding address space, so that the writing operation is completed.

Optionally, the system further comprises a disk writing module;

After column data of the time sequence data table is written into the memory mapping file, the disk writing module synchronizes the memory mapping file to the disk file in real time or forcedly writes the memory mapping file into the disk file through the msync function at regular time.

The method and the system for efficiently managing the time sequence data of the Internet of things based on the memory mapping have the beneficial effects that compared with the prior art:

(1) The invention uses the memory mapping technology to directly read and write the page buffer in the user space, and can avoid the process of copying the data of the page buffer to the user space buffer area, thereby achieving higher read-write efficiency, simplifying the programming model and improving the development efficiency and the system stability;

(2) The invention solves the problem of inconvenient memory mapping management of the variable length file, and can realize the reading and writing of variable length data types by fixed length management, space reservation and other methods;

(3) The invention supports the concurrent reading and writing of mass equipment through fixed-length metadata management under the condition of using limited system resources, and reduces the occupation of the database to the operating system resources.

Drawings

FIG. 1 is a flow chart of a method according to a first embodiment of the invention;

FIG. 2 is a block diagram illustrating a second embodiment of the present invention;

FIG. 3 is a diagram of a particular timing data table according to an embodiment of the present invention;

FIG. 4 is a memory mapped time series data file read-write flow chart according to an embodiment of the present invention.

Detailed Description

In order to make the technical scheme, the technical problems to be solved and the technical effects of the invention more clear, the technical scheme of the invention is clearly and completely described below by combining specific embodiments.

MMap is a method for mapping files in a memory, that is, mapping a file or other objects to an address space of a process, so as to realize a mapping relationship between a file disk address and a section of virtual address in a process virtual address space. After the mapping relation is realized, the process can read and write the section of memory by adopting a pointer mode, and the system can automatically write back the dirty page to the corresponding file disk, namely the operation on the file is completed without calling system calling functions such as read, write and the like. In contrast, the modification of the kernel space to the section of the region also directly reflects the user space, so that file sharing among different processes can be realized.

Embodiment one:

referring to fig. 1, this embodiment proposes a method for efficiently managing time-series data of the internet of things based on memory mapping, and the implementation of the method includes:

And (I) storing a time sequence data table in a column storage format, wherein the fields of the time sequence data table comprise fixed-length fields and variable-length fields, the total number of bytes occupied by data in a column of the fixed-length fields in the time sequence data table is obtained by calculating the product of the number of bytes occupied by the fixed-length fields and the total number of rows of the time sequence data table, and the total number of bytes occupied by the data in the column of the variable-length fields in the time sequence data table is obtained through an s file and a data column file.

Taking the "memory map rank time series data file structure diagram" as an example, the time series data table "demo" contains 4 fields: TIMESTAMP, INT, CHAR (10), VARCHAR (30), for example, wherein:

The first three columns are fixed-length fields, TIMESTAMP takes 8 bytes, INT takes 4 bytes, and CHAR (10) takes 10 bytes; for the fixed-length field, each field is stored by an independent file, and is read and written in a memory mapping MMap mode; for reading the fixed-length field, directly multiplying the number of bytes occupied by the fixed-length field by the total line number of the time sequence data table to obtain the address on the MMap file, and further converting the address into an actual data type to obtain access.

The fourth column VARCHAR (30) is a variable length column, requiring two files to manage; the s file records the variable-length character string length and the content of the actual data, and the data column file records the offset of the s file; during reading, the actual variable-length character string content is acquired through offset, according to fig. 3, when the second row record of the C4 column is read, the data file demo.3 of the C4 column is read through fixed-length calculation, the offset is 3, and then the s file of the C4 column is read according to the offset. The length of the actual variable length field is 4, and then the actual 4 characters are read afterwards, so that the second row of the varchar field content 'bbbb' of the actual record is obtained.

And (II) writing the column data of the time sequence data table into the memory mapping file through a mmap function or a remap function.

Referring to fig. 4, this process specifically includes:

It should be added that after the memory mapped file is expanded by the remap function, the first address of the memory mapped file will change, so the write operation needs to be written and locked.

It is to be added that when column data of the time sequence data table is written into the memory mapping file through the remap function, a reservation strategy is designed, a set space is reserved at one time in multiple or according to the percentage, after the set space meets the requirement of the writing space, the writing head address is searched according to the reading flow, and the data is written into the corresponding address space, so that the writing operation is completed.

Thirdly, managing the set number of devices through a time sequence data table, designing a metadata fixed length management strategy to rapidly locate data addresses on time sequence files, and realizing rapid reading and writing of time sequence data of different devices.

After column data of the time sequence data table is written into the memory mapping file, the memory mapping file is synchronized to the disk file in real time, or the column data is forcedly written into the disk file through the msync function at regular time.

Embodiment two:

Referring to fig. 2, this embodiment proposes an internet of things time sequence data efficient management system based on memory mapping, which includes:

In this embodiment, for the fixed-length field, each field is stored by an independent file, and is read and written by means of memory mapping MMap; for reading the fixed-length field, directly multiplying the number of bytes occupied by the fixed-length field by the total line number of the time sequence data table to obtain the address on the MMap file, and further converting the address into an actual data type to obtain access.

In this embodiment, the related data writing module specifically includes:

In this embodiment, when the related variable length writing submodule writes column data of the time sequence data table into the memory mapping file through the remap function, a reservation strategy is designed in advance to reserve a set space at one time in multiple or according to a percentage, after the set space meets the requirement of a writing space, a writing head address is searched according to a reading flow, and the data is written into a corresponding address space to complete the writing operation.

In this embodiment, the system further includes a disk writing module;

Specifically, in this embodiment, taking "memory map list time sequence data file structure diagram" as an example, the time sequence data table "demo" includes 4 fields: TIMESTAMP, INT, CHAR (10), VARCHAR (30), for example, wherein:

In summary, the method and the system for efficiently managing the time sequence data of the Internet of things based on the memory mapping can avoid the process of copying the data of the page buffer to the user space buffer, directly read and write the page buffer in the user space by using the memory mapping technology, realize efficient read and write, solve the problem of inconvenience in managing the variable-length file by the memory mapping, and support the read and write of the time sequence data of the mass equipment on the memory mapping file.

The foregoing has outlined rather broadly the principles and embodiments of the present invention in order that the detailed description of the invention may be better understood. Based on the above-mentioned embodiments of the present invention, any improvements and modifications made by those skilled in the art without departing from the principles of the present invention should fall within the scope of the present invention.

Claims

1. The method for efficiently managing the time sequence data of the Internet of things based on the memory mapping is characterized by comprising the following steps of:

2. The method for efficiently managing time-series data of the internet of things based on memory mapping according to claim 1, wherein for the fixed-length field, each field is stored by a separate file, and is read and written by means of memory mapping MMap;

3. The method for efficiently managing time-series data of the internet of things based on memory mapping according to claim 1, wherein the process of writing the column data of the time-series data table into the memory mapping file through a mmap function or a remap function specifically comprises:

4. The method for efficiently managing time series data of the internet of things based on memory mapping according to claim 3, wherein when column data of a time series data table is written into a memory mapping file through a remap function, a reservation strategy is designed, a set space is reserved in multiple or one time according to a percentage, after the set space meets the requirement of a write space, a write head address is searched according to a read flow, and data is written into a corresponding address space to complete a write operation.

5. The method for efficiently managing time-series data of the internet of things based on memory mapping according to claim 1, wherein after column data of the time-series data table is written into the memory mapping file, the memory mapping file is synchronized to the disk file in real time or forcedly written into the disk file by a msync function at regular time.

6. The utility model provides a high-efficient management system of thing networking time sequence data based on memory mapping which characterized in that, it includes:

7. The efficient management system of time sequence data of the internet of things based on memory mapping according to claim 6, wherein for fixed-length fields, each field is stored by a separate file, and is read and written by means of memory mapping MMap;

8. The efficient management system of time-series data of internet of things based on memory mapping of claim 6, wherein the data writing module specifically comprises:

9. The efficient management system of time series data of internet of things based on memory mapping according to claim 8, wherein when the variable length writing submodule writes column data of the time series data table into the memory mapping file through a remap function, a reservation strategy is designed in advance to reserve a set space at one time in multiple or according to a percentage, after the set space meets a writing space requirement, a writing head address is searched according to a reading flow, and the data is written into a corresponding address space to finish writing operation.

10. The system for efficiently managing time-series data of the internet of things based on memory mapping according to claim 8, wherein the system further comprises a disk writing module;