CN104731779A

CN104731779A - Real-time file system data organization and management method facing real-time databases

Info

Publication number: CN104731779A
Application number: CN201310692742.0A
Authority: CN
Inventors: 徐新国; 康卫; 李林; 朱廷劭
Original assignee: No6 Research Institute Of China Electronics Corp
Current assignee: No6 Research Institute Of China Electronics Corp; 6th Research Institute of China Electronics Corp
Priority date: 2013-12-18
Filing date: 2013-12-18
Publication date: 2015-06-24

Abstract

The invention discloses a real-time file system data organization and management method facing real-time databases. The real-time file system data organization and management method facing real-time databases mainly comprises that indexed modes of archived data in the real-time databases and indexed modes of data of a file system are fused through the technology of fusing the databases and the file system, namely, organization and management of history data are achieved in a file system. The organization and management method manages data according to a mode of time-period-point-time-period, as shown in an abstract of description, the archived data is stored according to time information of industrial sampling batches, the time information of the data is divided into different time periods which comprise four-level time periods, time serves as index, and high-speed write-in of large-scale industrial collection data is guarneteed.

Description

A kind of real-time file system data organization and management method towards real-time data base

Technical field

The present invention relates to Real-Time Databases System Technique and real-time file system technology, refer to a kind of method of the real-time file system data organization and management for real-time data base especially.

Background technology

Real-time data base is the product that database technology is combined with real-time system.Real-time data base has a wide range of applications in workflow industry (petrochemical industry, electric power, iron and steel).It can be used for the automatic collection of production run, storage and supervision, also can the data for many years of each technique sampled point of on-line storage, is the core of enterprise MES.The main feature of real-time data base is exactly that its data and task have explicit timing restriction, but because real-time data base is mainly used in workflow industry, and workflow industry usual sampling number scale is larger, also storage and the retrieval of extensive filing data has been required while requiring real-time, so that with post analysis and emergency review, therefore how real-time data base carries out effective organization and administration to the data that will store efficiently, has just become the key factor determining real-time data base performance quality.

Current real-time data base product, in the management of historical data tissue, there are problems, be mainly reflected in bottom and adopt general file system, as ext4, FAT32 etc., and most of universal document system adopts hierarchical structure and directory index mechanism, this grader has efficient performance built in document classification management, the dynamic increase and decrease of number of files, the dynamic growth aspect of file data, but under specific application demand, as storage administration take point as the large-scale industry data of unit time, then can the performance of restriction system.

Filing data is based upon on generic file system, except setting up the data indexing information of filing data itself, also needs the index relative between the catalogue of foundation and maintaining file system itself on file system layer, file.Along with the increase of collection point scale, the increasing of data in the unit interval, will bring very large time delay to Database Systems.Filing data has obvious time series and the independence in units of point, this feature can be utilized, the filing management of data in real-time data base is combined with the management of data in file system, reduce number of times and the time of data directory foundation, meet real-time data base and timely filing data can be written to requirement on disk, meanwhile, some restrictions of original file system can also be eliminated, as the restriction etc. of catalogue number, number of files, file size.

Summary of the invention

In view of this, fundamental purpose of the present invention is a kind of method providing real-time file system data organization and management towards real-time data base, the process employs the technology of database and file system fusion, described database and file system integration technology refer to and are merged by the indexed mode of data in the indexed mode of filing data in real-time data base and file system, in file system, namely realize the organization and management of historical data.Described data organization and management method accesses filing data according to the temporal information of industrial sample batch, ensure that the high speed writein of large-scale industry image data.

The method of described data organization and management be according to time period-point-time period management data, as shown in Figure 1.Mainly comprise the content of following 6 aspects: according to different time sections organising data, binding site index, structurized data block, it doesn't matter for the cumulative data amount of index data and disk, and data sequence stores and data directory mode.

Described according to different time sections organising data, refer to that industrial data contains stronger temporal information, these data are constantly written on disk according to time series simultaneously, therefore the temporal information of data are divided into the different time periods, according to time period organization and administration data, the employing time is as index, catalogue in time period and generic file system is similar, when searching data, first chooses object time section, then data query in object time section, thus reduce the data query time.

Described binding site index, refers between time period and time period, addition point index information, the data of originally same point between two different time periods is coupled together by the existence of putting index information, is convenient to reverse traversal queries between data block.

Described structurized data block, refers in the file system of routine, data manage by last layer index, its address information recording is in last layer index.In the present invention, because each some data volume is in a period of time smaller, the data of each point are for this time period simultaneously, and these data class are like broken file.If manage so how broken file, can take very large disk space, a lot of disk fragments can be brought in simultaneously reserved space.Therefore we have done an index record within the data block, like this, in whole disk use procedure, can not cause disk fragments.

It doesn't matter for the cumulative data amount of described index data and disk, refer to when the data of index point, have nothing to do with the cumulative data amount of this point, relevant with this data volume within certain time period, and within the time period that this is relevant, the data scale of this point is very little, when index, little to the pressure of internal memory, therefore, when the data of index multiple somes synchronizations, also system crash can not be caused.

Described data sequence stores, and refers to that filing data in real-time data base is according to the continuous write into Databasce of time sequencing, if adopt conventional file system, needs continuous moving head, carrys out the index information of service data file.And the present invention will be sequentially written in disk with a collection of filing data, reduce the movement of magnetic head and find the time of suitable magnetic track, and also can reduce the time of system call simultaneously.

Described data directory mode, refers to from top to bottom altogether containing level Four temporal information in file system system, as shown in Figure 2.The time period span of the first order is maximum.The time period span of the second level is smaller, usually it can be 24 hours, 10 hours, 1 hour, the span of second level time period will be determined according to the scale of counting of configuration, the sample frequency of point, the compressibility of point, if time span is too large, the validity of its index information will reduce.The third level time period is included in data, the temporal information of this one-level only has relation with some points specifically, therefore the time span of third level time period is relevant with the sampling rate of certain point, if the sampling rate of fruit dot is in level second, so the time interval of this one-level is exactly the several seconds, containing temporal information in each data, temporal information is that forward direction does difference.Between second level time period information and third level time period information, add an index information, be according to a management by the original Data Division according to Time alignment.The later time period information of point only and point have relation, from this one level temporal, the method for reverse backtracking can carry out index data, in the process of rollback forward, the time period of leap is relevant with concrete sampled point.The temporal information of afterbody follows each data, and this meets definition < point to industrial data, time, numerical value >, ensure that the integrality of data.

According to data directory mode as shown in Figure 2, to accumulative a collection of filing data, it is filed in a data block according to the form of data block, described data block format is < point, time migration, numerical value >, extract third level time period information, second level time period information, first order time period information, then new index information is updated in the different time periods, upgrades the dot information in the time period of the second level simultaneously.If the second level time period overflows, then need to set up the new second level time period on disk.In like manner, if the first order time period overflows, then need to set up the new first order time period.Because the time is evenly passage, thus first order time period and second level time period be spacedly distributed on a timeline.

Inquire about certain some data at a time, by the time interval of object time according to three time periods, obtain three time periods, then the first order time period is searched from root directory, if, then there is not target data, search the second level time period again under first time period in failure, if failure, then section does not exist the object time.Then within the time period of the second level, impact point is searched, if impact point does not exist, then the data not containing impact point, then this is found to put in the end one level temporal section, check and whether hit time point if not hit, then need rollback forward to carry out the query aim time period.

In order to improve the hit rate of target data, reducing the distance of query time and magnetic head movement, in file system, remaining the following information of each point.Point Time Created (St), sampling period (T), data compression rate (R), this numerical value of described data compression rate is that long-time statistical analysis obtains, the data of sampled point fluctuate in one-period, then what its compressibility can be stable concentrates on a certain value.If one piece can write this point data entry is N, the time interval of two second level time periods is TR, then can calculate the second level time period cycle tb of full a piece of target data, the computing formula of tb is as follows

tb = \frac{NT}{TR * R}

If the object time of data query is Ot, then the second level time period that target data may be in is OSt, and the computing formula of described OSt is as follows

OSt [\frac{Ot - St}{tb} + 1 / 2] * \frac{tb}{TR}

Adopt in this way, the point that main some sample frequency of process are little, the secondary time index of these points may cross over multiple secondary time period, can increase the consuming time of magnetic head movement like this.

Accompanying drawing explanation

Fig. 1 be according to time period-point-time period management data.

Fig. 2 is the bibliographic structure of real-time file system.

Fig. 3 is the magnetic disk of real-time file system in embodiment.

Fig. 4 is the superblock of real-time file system in embodiment.

Fig. 5 is the relation in embodiment between real-time file system disk each several part.

Fig. 6 is that the memory information of real-time file system in embodiment is safeguarded.

Fig. 7 is the system access interface of real-time file system in embodiment.

Embodiment

For making the object, technical solutions and advantages of the present invention clearly understand, lift specific embodiment below, the present invention is further detailed explanation.

The real-time file system data organization and management method of the present embodiment is developed based on linux system, on the Linux platform running on PC, comprises Ubuntu, Redhat etc., uses Linux2.6.38.8 kernel, but is not limited to this kernel.

DISK to Image is divided into the form as Fig. 3 by real-time file system.Each group group structure records the full detail of group.Each group is made up of a few parts such as superblock, group message block, first order time period information, dot information, second level time period information, data blocks, and its structure as shown in Figure 3.Shared by each group, disk space is in different size, and each group comprises a first order time period at the most, but total size can not more than the hard drive space of 4TB.

Superblock: the size of superblock is 1024 bytes, the same with the superblock in conventional file system, the essential information of recording disc, the size of such as partition size, file system type, block, magic number, user input the maximum ID etc. of maximum number of points, current period, and the maintenance information of some necessity, such as: last access time, the position etc. of next free block.The specifying information of described superblock as shown in Figure 4.

Group message block: preserve all grouping informations of disk is the copy to first group information in each group, this information preserving type having redundancy, be conducive to data in magnetic disk can recover after being destroyed after organize data.The essential information of corresponding group is saved, the quantity etc. of the reference position of such as data block, the position of next free block, residue free block in the group information of specifying.When after the free block of a group is finished or after the new first order time period arrived, just need to create new group, some essential informations of group need to come from a upper group copy, because cannot determine some group of allocation really in disk initialization, therefore this part information cannot be determined.

Year information, i.e. first order time period: the time period of this one-level is more coarse, in the present embodiment be first time period with year, also six months or one month can be configured to, such retrieval can be more effective, but can take certain disk space, and this cost needs to weigh.Here the initial time of this time period, the reference position of next stage time period information and active second level time period is only recorded.

Dot information block: contain essential information a little, such as data type, sample frequency, compression factor etc., these information contribute to determining fast sampled point target data may time range, reduce the query time of target data.

Hour information, the i.e. time period of the second level: the time period of this one-level can be designed as one hour, 10 minutes or one minute, these needs are determined according to the size of sample frequency and data field.In the present embodiment, employing is the 1 hour record time period once.Equally, the information such as in this time period information, the full detail of this one-level time period in store, comprises the reference position of data block, the number of data block, point data information position.

Described point data positional information: the positional information recording last data block of this point.

Described data block size can be: the combination of 1024 bytes, 2048 bytes, 4096 bytes.

Be more than the layout information of file system, the relation between each several part as shown in Figure 5.

Memory information is safeguarded as shown in Figure 6.

Described memory information is safeguarded, the same with alternative document system, in order to reduce, the reading that repeats of disc information is raised the efficiency, internal memory has made a copy to fdisk information, and information of these copies comprise: the group information of superblock, current active, first order time period information and second level time period information.These information take very little memory headroom, and in the system that requirement of real-time is high, this memory cost is worth, and superblock is kept in the superblock list of Linux Virtual File System.

Except above-mentioned disc information is kept in internal memory, in internal memory, also preserve superblock information.Described superblock information and superblock are different concepts, and superblock information is kept in the superblock information list of Linux Virtual File System.Superblock, active group, first order active time section, the position of active time section in internal memory, the second level is record in superblock information.Also comprise the pointer of these information of operation simultaneously.

The information of same in store third level time period in internal memory, information between the third level time period of current active and next time period is kept in internal memory, the management information such as the positional information that the data that these information comprise this time period are preserved, but do not comprise authentic and valid data, someone may think that preserving important information with a kind of like this preserving type may exist unsafe factor, phenomenon is exhausted if there is internal memory, just likely lose these management information, thus the total data under this time period is lost.Problem is above the same, and this part internal memory taken is very little, and save as in the operating system of GB unit interior, this dot information only accounts for the per mille of internal memory.

When the new second level time period arrives, the second level time period information of above-mentioned information and current active is saved in disk, in internal memory, set up new second level time period information simultaneously, equally, when the new first order time period arrives, preserve first order time period information active at present to disk, set up new active first order time period information simultaneously.

System access interface.

Described system access interface, once can write multiple data blocks of multiple point, reduces the expense of system call.Simultaneously by identical with read access Interface design for the write access interface of system external, user friendly operation.Concrete Interface design is as follows:

Described write access interface is: rtfs2_write (struct datas_package_info*input_dpk).

Described read access interface is: rtfs2_read (struct datas_package_info*outpt_dpk).

Described struct datas_package_info is packet information, and the information needing user to fill comprises: initial time, the end time of all data blocks, count, data block number, point data information pointer, data block pointer.Point data Information Organization is an array, and point data information pointer points to array reference position, and the size of array is that actual collection is counted.Same data block is also organized as array form, the reference position of data block pointed array, and block number have recorded the number of data block and the size of array.

The information that described point data information contains comprises: period, and the physics of corresponding reality gathers period; The initial time of data, the initial time of this all data blocks; End time, the end time of this all data blocks; Data block number; Data block pointer, the skew in the data block array that this data block pointed is presented hereinabove.

The information contained in described data block information comprises: last blocks of data pointer, and system retains, and is filled by file system; The initial time of data in block; The end time of data in block; Data type, as byte time+byte data, byte time+two byte data etc.; Data block bias internal, the service condition in recording data blocks space; It is finally metadata.

The Information Organization of whole packet and system access interface are as shown in Figure 7.Write in process in data, need to encapsulate above-mentioned information in structure.Same in data read procedure, user can be sent to after populated for above-mentioned information in file system layer.User fetches to the chain of data block from packet information to point data information and resolves concrete data.

Write process.

The temporal information that described process of writing will preserve data is divided into three grades of time period Ti1, Ti2, Ti3, the time period of current active in internal memory is labeled as respectively: Ta1, Ta2, Ta3.Here is that it writes the false code of process:

Read procedure.

The design philosophy of described read procedure according to real-time file system and the storage mode of data, if obtain the data of some points at some time points, this time point is divided into three grades of temporal informations, first order time period position is found in the first order time period information of disk, the position of preserving second level time period information is found in first order time period information, in second level time period information, find the position of final data block, this position is the physical location of disk.Third level time period information and data directly can be obtained by this position.Then in internal memory, data are searched, find data corresponding to third level time period.

The above embodiment is only preferred embodiment of the present invention, not in order to limit the present invention, within the spirit and principles in the present invention all, and any amendment done, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.

Claims

1. a real-time file system data organization and management method, it is characterized in that: according to time period-point-time period management data, the method comprises: according to different time sections organising data, binding site index, structurized data block, it doesn't matter for the cumulative data amount of index data and disk, and data sequence stores and data directory mode.

2. real-time file system data organization and management method according to claim 1, it is characterized in that: described according to different time sections organising data, refer to and the temporal information of data is divided into the different time periods, altogether containing the level Four time period, the employing time, as index, when searching data, first chooses object time section, then data query in object time section, thus reduce the data query time.

3. real-time file system data organization and management method according to claim 1, it is characterized in that: described binding site index, refer between time period and time period, addition point index information, the data of originally same point between two different time periods are coupled together by the existence of putting index information, is convenient to reverse traversal queries between data block.

4. real-time file system data organization and management method according to claim 1, it is characterized in that: described data sequence stores, when referring to filing data write into Databasce continuous according to time sequencing, disk will be sequentially written in a collection of filing data, reduce the movement of magnetic head and find the time of suitable magnetic track, also can reduce the time of system call simultaneously.

5. real-time file system data organization and management method according to claim 1, it is characterized in that: described data directory mode, refer in file system system from top to bottom altogether containing level Four temporal information, the time period span of the first order is maximum, be generally 1 year, the time period span of the second level is smaller, usually it can be one day, the span of second level time period will according to the scale of counting of configuration, the sample frequency of point, the compressibility of point is determined, the third level time period is included in data, the temporal information of the third level only has relation with some points specifically, therefore the time span of third level time period is relevant with the sampling rate of certain point, if the sampling rate of fruit dot is in level second, so the time interval of this one-level is exactly the several seconds, containing temporal information in each data, temporal information is that forward direction does difference, an index information is added between second level time period information and the 3rd time period information, be according to a management by the original Data Division according to Time alignment, the later time period information of point only and point have relation, from this one level temporal, the method of reverse backtracking can carry out index data, in the process of rollback forward, the time period crossed over is relevant with concrete sampled point, the temporal information of afterbody follows each data, this meets the definition < point to industrial data, time, numerical value >, ensure that the integrality of data.