CN113806803B

CN113806803B - Data storage method, system, terminal equipment and storage medium

Info

Publication number: CN113806803B
Application number: CN202111091089.3A
Authority: CN
Inventors: 倪子程; 陈奋; 陈荣有; 孙晓波; 龚利军
Original assignee: Xiamen Fuyun Information Technology Co ltd
Current assignee: Xiamen Fuyun Information Technology Co ltd
Priority date: 2021-09-17
Filing date: 2021-09-17
Publication date: 2023-06-02
Anticipated expiration: 2041-09-17
Also published as: CN113806803A

Abstract

The invention relates to a data storage method, a system, a terminal device and a storage medium, wherein the system comprises: file list files, directory structure files, and data storage files; the file list file is used for storing file information of the directory structure file and the data storage file and directory root node addresses; the directory structure file and the data storage file comprise file heads, a data area and a summary area, wherein the file heads are used for storing file information and structure information; the abstract area is used for storing the use state of each cluster in the corresponding data area, the number of used effective clusters in the data area of each block and the check code corresponding to the data area of each block; the data area of the directory structure file is used for storing address information of each file node; the data area of the data storage file is used for storing the data information of each file node. The invention adopts the directory tree structure to store the files, thereby greatly optimizing the query traversing speed, reducing the size of the stored files and supporting flexible full-scale synchronization.

Description

Data storage method, system, terminal equipment and storage medium

Technical Field

The present invention relates to the field of file technologies, and in particular, to a data storage method, a system, a terminal device, and a storage medium.

Background

The Web application system is widely applied to important business lines such as social contact, shopping, banking, mail and the like, plays a very important role in network assets, has wide attacked surface and more attack technologies, and is easy to invade.

Network attackers usually utilize vulnerabilities existing in attacked websites to perform activities such as illegal profit-making or malicious business attack by embedding illegal hidden links in web pages to tamper with web page contents. Malicious tampering of a web page may affect the user's normal access to the web page content, and may also result in serious economic, branding, and even political risks.

The common modes of webpage tamper resistance include an externally hung polling technology, a core embedded technology, an event triggering technology and the like, but the common modes include a core which is to calculate hash of each webpage file in advance and store the hash, and when tamper resistant software works, the actual hash of the current webpage file is calculated and compared with the recorded hash according to the need, so as to judge whether the file is tampered or not. There is a need for a storage means that can store information such as hashes quickly and reliably. The traditional storage mode is to select a database to store data, but web page tamper resistance is not suitable for using network databases such as mysql, mssql and the like due to various factors of a working environment, and the sqlite file database is mostly used. Although the sqlite performance satisfies the requirements, the following two problems still exist in practical use.

The sqlite is stored in a table form, and although the index is established by a path, the sqlite is in a table structure in actual use, so that when traversing all data, the sqlite cannot be performed according to an actual file directory structure, and multiple queries and IO operations are added intangibly.

2. In the webpage tamper resistance, hash data are calculated and generated in a safe environment and then are synchronized to a webpage tamper resistance program working on line. There are two ways of synchronization a. incremental synchronization: only the changed information is synchronized in the past each time, and a complex and fine log management mechanism is needed in the method, otherwise, the method is too easy to make errors; b. full synchronization: the synchronization of all hash data is complete, and the number of such methods used in large sites can be large, often requiring hundreds of megabits of data to be synchronized by updating only one piece of data.

Disclosure of Invention

In order to solve the problems, the invention provides a data storage method, a system, a terminal device and a storage medium.

The specific scheme is as follows:

a data storage system, comprising: file list files, directory structure files, and data storage files;

the file list file is used for storing file information and directory root node addresses of directory structure files and data storage files, and the file information comprises file codes, file types and file check codes;

the directory structure file and the data storage file each comprise a file header, a data area and a summary area, wherein:

the file header of the directory structure file is used for storing file information and structure information of the directory structure file; the file header of the data storage file is used for storing file information and structure information of the data storage file; wherein the structure information includes the total number of valid clusters in the data area;

the data area of the directory structure file is used for storing address information of each file node, and the address information consists of directory address information and file address information, wherein: the directory address information comprises the length of directory names, the number of sub-nodes contained in the directory, the address of a higher-level directory node of the directory, the address of each sub-node contained in the directory and the directory names; the file address information comprises the length of a file name, the address of a father node corresponding to the file node, the file name and the address of the corresponding file node of the file node in the data storage file;

the data area of the data storage file is used for storing data information of each file node, and the data information comprises a check code of the file and file storage path information;

the summary areas of the directory structure file and the data storage file are used for storing the use state of each cluster in the corresponding data area, the number of used effective clusters in the data area of each block and the check code corresponding to the data area of each block.

Further, the usage status of each cluster stored in the abstract area of the directory structure file includes four types, which are respectively: unused, directory information, file name, or directory name.

Further, the usage status of each cluster stored in the summary area of the data storage file includes three types, which are respectively: unused, check code of file, file storage path information.

A data storage method, according to an embodiment of the present invention, is a data storage system, including: when a new file node is needed, acquiring data information of the new file node according to a file corresponding to the new file node, and storing the data information corresponding to the new file node into a data area of a data storage file; according to the address and file storage path information stored in the data storage file of the data information corresponding to the newly added file node, acquiring the directory address information and the file address information corresponding to the newly added file node, and newly adding the file address information in the data area of the directory structure file, and updating or newly adding the directory address information.

Further, when information is newly added in the data area of the data storage file or the directory structure file, searching whether the use state of n continuous clusters exists in the corresponding abstract area is unused, n is the number of clusters required by the newly added information, and if so, storing the newly added information into the n continuous clusters in the searched data area; otherwise, a space of a block is newly added in the data storage file or the directory structure file, and newly added information is stored in n continuous clusters in a data area of the newly added space; and after the newly added information is stored in the data area, updating the use states of the corresponding stored n clusters in the abstract area.

A data storage method, according to an embodiment of the present invention, is a data storage system, including: when a file node needs to be deleted, the use state of the cluster corresponding to the file node stored in the abstract areas of the data storage file and the directory structure file is set to be unused, and the actual information stored in the data area is not deleted.

Further, when the file node needs to be deleted, the method further comprises: checking whether the ratio of the total number of the effective clusters of the data area to the total number of all clusters of the data area stored in the file heads of the data storage file and the directory structure file is smaller than a ratio threshold value, if so, defragmenting the data areas of the data storage file and the directory structure file which are smaller than the ratio threshold value, and deleting redundant clusters according to the blocks.

A data storage method, according to an embodiment of the present invention, is a data storage system, including: when whether the file is tampered needs to be judged, the method comprises the following steps of:

s101: calculating file check codes according to the directory structure file and the data storage file, comparing the calculated file check codes with the file check codes stored in the file header, and judging that the file is not tampered when the file check codes are identical to the file check codes: otherwise, enter S102;

s102: and calculating the check code corresponding to the data area of each block according to the directory structure file and the data area of the data storage file, comparing the calculated check code with the check code of the corresponding block stored in the abstract area, and comparing the data areas of the blocks which are different from each other according to bytes one by one to obtain a changed cluster, thereby obtaining the tampered file.

A data storage terminal device comprising a processor, a memory and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the data storage method according to the second embodiment of the invention when the computer program is executed by the processor.

A computer-readable storage medium storing a computer program which, when executed by a processor, implements the steps of the data storage method of the second embodiment of the present invention.

The invention adopts the technical scheme and has the beneficial effects that:

1. the method adopts the directory tree structure to store the files, greatly optimizes the query traversing speed, reduces the size of the stored files and supports flexible full-quantity synchronization.

2. The invention relates to a file structure specially designed for storing file directory data, which combines the characteristics of a sqlite file system and a FAT32 file system.

3. Compared with the existing storage mode of using the sqlite database in the market, the method has the advantages of supporting separate file storage, fast comparison difference, traversing data according to a directory structure, simplifying the data and effectively improving the working efficiency of webpage tamper resistance.

Drawings

Fig. 1 is a schematic diagram of a file list file according to an embodiment of the invention.

Fig. 2 is a schematic diagram of a directory structure file according to a first embodiment of the present invention.

Fig. 3 is a schematic diagram of a storage structure of a node address according to a first embodiment of the present invention.

Fig. 4 is a schematic diagram of a data storage file according to a first embodiment of the present invention.

Detailed Description

For further illustration of the various embodiments, the invention is provided with the accompanying drawings. The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate embodiments and together with the description, serve to explain the principles of the embodiments. With reference to these matters, one of ordinary skill in the art will understand other possible embodiments and advantages of the present invention.

The invention will now be further described with reference to the drawings and detailed description.

Embodiment one:

the embodiment of the invention provides a data storage system, which comprises three types of files, namely: file list files, directory structure files, and data storage files.

(1) The file list file is used for storing file information of the directory structure file and the data storage file and directory root node addresses.

In this embodiment, the file information includes a file code (file ID), a file type, and a file check code, as shown in fig. 1, where the file check code corresponds to one check code for each file, for example, file ID1 and file ID2 respectively correspond to respective file check codes, and in this embodiment, the file check code adopts a CRC32 check code, and in other embodiments, other forms of check codes may be adopted as needed, which is not limited herein.

The file types include both directory structure files and data storage files, denoted 1 and 2, respectively.

(2) The directory structure file is used for storing the file structure of the whole directory under the root node by adopting a corresponding tree structure, and carrying out corresponding sorting.

Referring to fig. 2, the directory structure file includes a file header, a data area, and a digest area.

The header of the directory structure file is used to store file information and structure information of the directory structure file. The file information is the same as the file information in the file list file, including file encoding, file type, and file check code. The file check code herein is an overall check code of the data area of all the blocks included in the entire directory structure file. The configuration information in this embodiment includes the summary start cluster number and the total number of valid clusters in the data area, i.e., the total cluster data contained in the data area minus the number of unused blank clusters.

The data area of the directory structure file is used for storing address information of each file node, and the address information consists of directory address information and file address information, wherein: the directory address information comprises the length of directory names, the number of child nodes contained in the directory, the address of a father node in the directory, the address of each child node contained in the directory and the directory names; the file address information comprises the length of a file name, the address of a father node corresponding to the file node, the file name and the address of the corresponding file node of the file node in the data storage file.

The file node means information for storing files in units of nodes, one file corresponding to one file node.

In this embodiment, the directory name is recorded from 0 bytes of each cluster, and the length of the directory name is used to determine whether the directory name is stored in more than one cluster, and if so, the more than one cluster is stored in the next cluster. The length of the file name and the file name are the same as the length of the directory name and the usage and storage mode of the directory name.

The address of the corresponding node of the file node in the data storage file is: the number of the cluster where the file node is in the data storage file, which is the same as the file node, for example, when the file node is file 1, the number of the cluster where the file node is in the data storage file is 1 to 4, as shown in fig. 4. In this embodiment, the node address is set to be 4 bytes of data, as shown in fig. 3, where the first byte is used to record the file ID, and the last three bytes are used to record the sequence number in the cluster.

The abstract area of the directory structure file is used for storing the use state of each cluster in the data area of the directory structure file, the number of used effective clusters in the data area of each block and the check code corresponding to the data area of each block.

The setting of the usage status of each cluster in this embodiment includes four types, respectively: 0 indicates unused, 1 indicates directory information, 2 indicates file information, 3 indicates file name or directory name,

In this embodiment, 16 bytes are set to 1 cluster, 128 clusters are set to 1 block, and each block includes a data area of 120 clusters and a digest area of 8 clusters.

(3) The data storage file is used for recording information of a file (such as a webpage file) corresponding to each file node. The structure of the data storage file is similar to that of the directory structure file, and as shown in fig. 4, the data storage file also includes a header, a data area, and a digest area.

The file header of the data storage file is used to store file information and structure information of the data storage file. The file information is the same as the file information in the file list file, and comprises file codes, file types and file check codes. It should be noted that, the file check code herein is an overall check code of the data area of all the blocks included in the entire data storage file. The configuration information in this embodiment includes a digest start cluster number and the total number of valid clusters of the data area.

The data area of the data storage file is used for storing data information of each file node, and in this embodiment, the data information includes check codes of the file and file storage path information. The check code of the file is a check code of the file corresponding to each file node, for example, the Md5 value of the web page file is different from the check code of the file in the file header. The file storage path information is full path information, and if one cluster is not enough to be stored, the cluster is added backwards.

The digest area of the data storage file is used for storing the use state of each cluster in the corresponding data area, the number of used effective clusters in the data area of each block, and the check code corresponding to the data area of each block. The usage status of each cluster stored in the summary area of the data storage file includes three kinds of usage status, respectively: 0 indicates unused, 1 indicates a check code of the file, and 2 indicates file storage path information.

Embodiment two: the invention also provides a data storage method, and the data storage system based on the first embodiment of the invention comprises the following steps:

(1) In the initial stage, a file list file, a directory structure file and a data storage file are created, and a space of one block is not allocated in advance to the directory structure file and the data storage file respectively.

(2) When a new file node is needed, acquiring the data information and the cluster number n occupied by the data information of the file corresponding to the new file node, searching whether the using states of the continuous n clusters exist in the abstract area of the data storage file or not according to the cluster number occupied by the data information, if so, storing the data information of the file into the continuous n clusters in the searched data area of the data storage file, and updating the using states of the continuous n clusters in the abstract area of the data storage file; if not, a block space is newly added in the data storage file, wherein the newly added block space comprises a data area and a summary area. In this embodiment, it is preferable to set all the data areas between different blocks to be connected, and all the digest areas to be connected, that is, when a block space is newly added, a 120-byte data area is added between the original data area and the digest area, and an 8-byte digest area is added below the original digest area. It should be noted that the contents of different digest areas are independent of each other, and each block has its corresponding digest area, that is, each digest area includes the number of valid clusters used in the data area of its corresponding block and the check code corresponding to the data area of its corresponding block.

When too many blocks are involved, the individual files become too large, at which point new data storage files may be re-created for segmented storage.

And acquiring directory address information and file address information of the newly added file node according to the file storage path information of the newly added file node, and updating or newly adding corresponding directory address information in a data area of the directory structure file according to the directory address information, and simultaneously, newly adding corresponding file address information. When directory address information or file address information is newly added in the data area, it is necessary to find whether the use state of clusters having the number of clusters required for continuously newly added information is unused from the digest area of the directory structure file in the same manner as the data storage file.

(3) When a file node needs to be deleted, only the use state of the cluster corresponding to the file node stored in the abstract areas of the data storage file and the directory structure file is set to be unused, and the actual information stored in the data area does not need to be deleted.

Further, the embodiment further includes checking whether the ratio of the total number of valid clusters of the data area stored in the file header of the data storage file and the directory structure file to the total number of all clusters of the data area is smaller than a ratio threshold (the ratio threshold is set to be 1/3 in the embodiment), if so, defragmenting the data areas of the data storage file and the directory structure file smaller than the ratio threshold, and deleting the redundant clusters according to the blocks. In this embodiment, the shredding is performed by making the valid clusters in the data area continuous in physical space, and updating the addresses of the corresponding file nodes stored in the directory structure file.

The total number of all clusters in the data area is the product of the number of blocks and the number of clusters in the data area contained in each block.

(4) When the file node needs to be modified, the method is divided into two cases, namely 1, when the cluster data occupied by the modified information is unchanged or reduced, the original address is directly covered, and 2, when the cluster data occupied by the modified information is increased, the method is carried out in a mode of deleting the old file node first and then adding the modified file node newly.

(5) When inquiring the file node, splitting the file storage path information corresponding to the file node according to the directory hierarchy, and then starting from the first directory structure file and starting from the root directory node.

(6) When the data in the directory structure file and the data storage file need to be judged whether to be changed or not, the method comprises the following steps:

s101: calculating file check codes according to the directory structure file and the data storage file, comparing the calculated file check codes with the file check codes stored in the file header, and judging that the data is unchanged when the file check codes are identical to the file check codes stored in the file header: otherwise, enter S102;

s102: and calculating the check code corresponding to the data area of each block according to the data areas of the directory structure file and the data storage file, comparing the calculated check code with the check code of the corresponding block stored in the abstract area, and comparing the data areas of the blocks which are different from each other according to bytes one by one to obtain a changed cluster.

Embodiment III:

the invention also provides a data storage terminal device, which comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the steps in the method embodiment of the first embodiment of the invention are realized when the processor executes the computer program.

Further, as an executable scheme, the data storage terminal device may be a computing device such as a desktop computer, a notebook computer, a palm computer, a cloud server, and the like. The data storage terminal device may include, but is not limited to, a processor, a memory. It will be appreciated by those skilled in the art that the above-described constituent structures of the data storage terminal device are merely examples of the data storage terminal device and do not constitute a limitation of the data storage terminal device, and may include more or less components than those described above, or may combine some components, or different components, for example, the data storage terminal device may further include an input/output device, a network access device, a bus, etc., which is not limited in this embodiment of the present invention.

Further, as an implementation, the processor may be a central processing unit (Central Processing Unit, CPU), other general purpose processor, digital signal processor (Digital Signal Processor, DSP), application specific integrated circuit (Application Specific Integrated Circuit, ASIC), field programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, etc. The general purpose processor may be a microprocessor or the processor may be any conventional processor or the like, which is a control center of the data storage terminal device, and which connects the various parts of the entire data storage terminal device using various interfaces and lines.

The memory may be used to store the computer program and/or the module, and the processor may implement various functions of the data storage terminal device by running or executing the computer program and/or the module stored in the memory and invoking data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, at least one application program required for a function; the storage data area may store data created according to the use of the cellular phone, etc. In addition, the memory may include high-speed random access memory, and may also include non-volatile memory, such as a hard disk, memory, plug-in hard disk, smart Media Card (SMC), secure Digital (SD) Card, flash Card (Flash Card), at least one disk storage device, flash memory device, or other volatile solid-state storage device.

The present invention also provides a computer readable storage medium storing a computer program which when executed by a processor implements the steps of the above-described method of an embodiment of the present invention.

The modules/units integrated in the data storage terminal device may be stored in a computer readable storage medium if implemented in the form of software functional units and sold or used as a stand alone product. Based on such understanding, the present invention may implement all or part of the flow of the method of the above embodiment, or may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a software distribution medium, and so forth.

While the invention has been particularly shown and described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A data storage system, comprising: file list files, directory structure files, and data storage files;

the file header of the directory structure file is used for storing file information and structure information of the directory structure file; the file header of the data storage file is used for storing file information and structure information of the data storage file; the structure information of the directory structure file and the data storage file comprises the total number of effective clusters of the data area;

2. The data storage system of claim 1, wherein: the usage status of each cluster stored in the abstract area of the directory structure file includes four types, which are respectively: unused, directory information, file name, or directory name.

3. The data storage system of claim 1, wherein: the usage status of each cluster stored in the summary area of the data storage file includes three kinds of usage status, respectively: unused, check code of file, file storage path information.

4. A data storage method, characterized in that: a data storage system according to any one of claims 1 to 3, comprising: when a new file node is needed, acquiring data information of the new file node according to a file corresponding to the new file node, and storing the data information corresponding to the new file node into a data area of a data storage file; according to the address and file storage path information stored in the data storage file of the data information corresponding to the newly added file node, acquiring the directory address information and the file address information corresponding to the newly added file node, and newly adding the file address information in the data area of the directory structure file, and updating or newly adding the directory address information.

5. The data storage method of claim 4, wherein: when information is newly added in a data area of a data storage file or a directory structure file, searching whether the use state of n continuous clusters exists in a corresponding abstract area or not is unused, n is the number of clusters required by the newly added information, and if the use state is the number of the clusters required by the newly added information, storing the newly added information into the n continuous clusters in the searched data area; otherwise, a space of a block is newly added in the data storage file or the directory structure file, and newly added information is stored in n continuous clusters in a data area of the newly added space; and after the newly added information is stored in the data area, updating the use states of the corresponding stored n clusters in the abstract area.

6. A data storage method, characterized in that: a data storage system according to any one of claims 1 to 3, comprising: when a file node needs to be deleted, the use state of the cluster corresponding to the file node stored in the abstract areas of the data storage file and the directory structure file is set to be unused, and the actual information stored in the data area is not deleted.

7. The data storage method of claim 6, wherein: when the file node needs to be deleted, the method further comprises the following steps: checking whether the ratio of the total number of the effective clusters of the data area to the total number of all clusters of the data area stored in the file heads of the data storage file and the directory structure file is smaller than a ratio threshold value, if so, defragmenting the data areas of the data storage file and the directory structure file which are smaller than the ratio threshold value, and deleting redundant clusters according to the blocks.

8. A data storage method, characterized in that: a data storage system according to any one of claims 1 to 3, comprising: when whether the directory structure file and the data storage file are tampered needs to be judged, the method comprises the following steps:

s101: calculating file check codes of the directory structure file and the data storage file according to the directory structure file and the data storage file, comparing the calculated file check codes of the directory structure file with file check codes stored in file headers of the directory structure file, comparing the calculated file check codes of the data storage file with file check codes stored in file headers of the data storage file, and judging that the directory structure file is not tampered when comparison results corresponding to the directory structure file are the same; when the comparison results corresponding to the data storage files are the same, judging that the data storage files are not tampered; when judging that the directory structure file or the data storage file is tampered, entering S102;

s102: and calculating the check code corresponding to the data area of each block according to the directory structure file or the data area of the data storage file, comparing the calculated check code with the check code of the corresponding block stored in the abstract area, and comparing the data areas of the blocks which are different from each other according to bytes one by one to obtain a changed cluster, thereby obtaining the tampered file.

9. A data storage terminal device characterized by: comprising a processor, a memory and a computer program stored in the memory and running on the processor, which processor, when executing the computer program, carries out the steps of the method according to any one of claims 4 to 8.

10. A computer-readable storage medium storing a computer program, characterized in that: the computer program implementing the steps of the method according to any of claims 4 to 8 when executed by a processor.