CN107291915A

CN107291915A - A kind of small documents storage method, small documents read method and system

Info

Publication number: CN107291915A
Application number: CN201710501667.3A
Authority: CN
Inventors: 李杰辉; 牛立国
Original assignee: Beijing QIYI Century Science and Technology Co Ltd
Current assignee: Beijing QIYI Century Science and Technology Co Ltd
Priority date: 2017-06-27
Filing date: 2017-06-27
Publication date: 2017-10-24

Abstract

The invention discloses a kind of small documents storage method, small documents read method and system, the small documents storage method includes：A metadatabase is preset in the logic storage unit of big file；The data content information of small documents to be stored is stored in big file, the data content information of the small documents to be stored is added to the end of big file；The metadata of small documents to be stored is stored in metadatabase, metadata includes：Offset and self-defining metadata of the title, size, type, check value, timestamp, small documents of small documents in big file.The metadata of the data content information of small documents to be stored and small documents to be stored is stored separately by the present invention by increasing a metadatabase in the logic storage unit of big file, and realizing individually can be to the modification of metadata, addition and deletion；In addition, when metadata be loaded into internal memory cached, it is not necessary to scan whole big file again.

Description

A kind of small documents storage method, small documents read method and system

Technical field

The present invention relates to technical field of data storage, more particularly to a kind of small documents storage method, small documents reading side Method and system.

Background technology

The high speed development of internet generates the files such as the picture of magnanimity, document, is that size is smaller the characteristics of these files (typically in below 100KB), enormous amount (hundreds of millions of), traditional POSIX interface document systems have been difficult to meet demand, Here it is the famous mass small documents problem of industry.For this problem, the common practice of industry is to merge storage, i.e., by small text Part merges storage to a traditional POSIX file, such as Haystack of Facebook, LinkedIn Ambry and Taobao TFS, the merging storage mode of these systems is all similar, is exactly that server end only preserves part metadata, remaining member number Client preservation is given according to file ID is encoded into, while creating index to the small documents in big file, property is read to reach to improve The purpose of energy.Metadata size, the type of server end preservation are all regularly to be stored in initial data on big file.

Above-mentioned storage mode has these defects：(1) self-defining metadata is not supported, if to support first number of new type According to, must just update small documents organizational form or change file ID；(2) when changing existing metadata, new value size has Strict limitation, otherwise can cover other valid data；(3) some operations only need to read metadata, it is still desirable in big file In navigate to small documents offset could start read, it is also desirable to can be in these metadata of memory cache, existing Under framework, if to cache these metadata, need from the beginning to scan whole big file, it is less efficient.And cause these master It is that small documents are stored on disk in the way of merging storage to want reason, needs strict border to distinguish between small documents, one Denier write-in just can not change border again, therefore form of the small documents in big file has a strict limitation, and can not arbitrarily change, Addition and deletion metadata.

The content of the invention

It is existing to solve it is an object of the invention to propose a kind of small documents storage method, small documents read method and system Have in technology because small documents are stored on disk in the way of merging storage, need strict border to come area between small documents Point, once write-in just can not be on modification border, therefore form of the small documents in big file has strict limitation, and can not be any The problem of modification, addition and deletion metadata.

To reach above-mentioned purpose, the invention provides following technical scheme：

A kind of small documents storage method, including：

A metadatabase is preset in the logic storage unit of big file；

The data content information of small documents to be stored is stored in the big file, the data content letter of the small documents Breath is added to the end of the big file successively；

The metadata of the small documents to be stored is stored in the metadatabase.

Wherein, the metadata includes：The filename of the small documents, the size of the small documents, the small documents The offset of type, the check value of the small documents, the timestamp of the small documents, the small documents in the big file with And self-defining metadata.

Wherein, the metadatabase is key value database RocksDB.

It is preferred that, in addition to：

By the part metadata redundant storage of the small documents to be stored in the big file, the small documents to be stored The data content information redundancies of part metadata and the small documents to be stored be stored in one big file, it is described to be stored The part metadata of small documents includes：Offset of the small documents in the big file, the filename of the small documents and Shared by length, the metadata of the small documents in the data of space size, the check value of the small documents and the small documents Hold the original position of information.

A kind of small documents storage system, including：

Default unit, for presetting a metadatabase in the logic storage unit of big file；

First memory cell, it is described for the data content information of small documents to be stored to be stored in the big file The data content information of small documents to be stored is added to the end of the big file successively；

Second memory cell, for the metadata of the small documents to be stored to be stored in the metadatabase.

Wherein, the metadata includes：The filename of the small documents, the size of the small documents, the small documents Type, the check value of the small documents, the timestamp of the small documents, the path of the big file, the small documents are described Offset and self-defining metadata in big file.

Wherein, the metadatabase is key value database RocksDB.

It is preferred that, first memory cell is additionally operable to the part metadata redundant storage of the small documents to be stored In the big file, the part metadata of the small documents to be stored and the data content information of the small documents to be stored are superfluous Remaining to be stored in one big file, the part metadata of the small documents to be stored includes：The small documents are in the big file In offset, the filename of the small documents and length, space size, the small documents shared by the metadata of the small documents Check value and the small documents data content information original position.

A kind of small documents read method, including：

Obtain the filename of the small documents；

Searched according to the filename of the small documents in the metadatabase of metadata of the small documents is stored described small The metadata information of file, the metadata information includes：In big file where the size of the small documents, the small documents The offset of path and the small documents in the big file；

According to the metadata information of the small documents the data content information for storing the small documents the big file The middle data content information for reading the small documents；

The data content information of the metadata information of the small documents and the small documents is returned into user.

A kind of small documents read system, including：

Acquiring unit, the filename for obtaining the small documents；

Searching unit, for the filename according to the small documents the metadata for storing the small documents metadatabase The middle metadata information for searching the small documents, the metadata information includes：The size of the small documents, small documents institute The offset of path and the small documents in the big file in big file；

Reading unit, the data content information of the small documents is being stored for the metadata information according to the small documents The big file in read the data content information of the small documents；

Feedback unit, for the data content information of the metadata information of the small documents and the small documents to be returned to User.

Via above-mentioned technical scheme understand, compared with prior art, the invention discloses a kind of small documents storage method, Small documents read method and system, the small documents storage method include：One is preset in the logic storage unit of big file Metadatabase；The data content information of small documents to be stored is stored in big file, in the data of the small documents to be stored Hold the end that information is added to big file；The metadata of small documents to be stored is stored in metadatabase, metadata includes：It is small Offset and self-defining metadata of the title, size, type, check value, timestamp, small documents of file in big file. The present invention is believed the data content of small documents to be stored by increasing a metadatabase in the logic storage unit of big file Breath and the metadata of small documents to be stored are stored separately, realize individually can be to metadata modification, addition and deletion； In addition, when metadata be loaded into internal memory cached, it is not necessary to scan whole big file again.

Brief description of the drawings

In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing There is the accompanying drawing used required in technology description to be briefly described, it should be apparent that, drawings in the following description are only this The embodiment of invention, for those of ordinary skill in the art, on the premise of not paying creative work, can also basis The accompanying drawing of offer obtains other accompanying drawings.

Fig. 1 is a kind of small documents storage method schematic flow sheet provided in an embodiment of the present invention；

The file system schematic diagram that Fig. 2 is constituted for multiple big files provided in an embodiment of the present invention；

Fig. 3 is the list of meta data of the small documents provided in an embodiment of the present invention being stored in metadatabase；

Fig. 4 is the part list of meta data of the small documents provided in an embodiment of the present invention being stored in big file；

Fig. 5 is a kind of small documents memory system architecture schematic diagram provided in an embodiment of the present invention；

Fig. 6 is a kind of small documents read method schematic flow sheet provided in an embodiment of the present invention；

Fig. 7 reads system structure diagram for a kind of small documents provided in an embodiment of the present invention.

Embodiment

Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete Site preparation is described, it is clear that described embodiment is only a part of embodiment of the invention, rather than whole embodiments.It is based on Embodiment in the present invention, it is every other that those of ordinary skill in the art are obtained under the premise of creative work is not made Embodiment, belongs to the scope of protection of the invention.

For the merging storage of mass small documents, include the write-in and the reading of small documents of small documents.Individually below from small The write-in of file and the storage mode of the reading explanation mass small documents of small documents.

Accompanying drawing 1 is referred to, Fig. 1 merges the signal of storage method flow for a kind of mass small documents provided in an embodiment of the present invention Figure.As shown in figure 1, merging storage method the invention discloses a kind of mass small documents, this method specifically includes following steps：

S101, a default metadatabase in the logic storage unit of big file；

Refer to accompanying drawing 2, the file system schematic diagram that Fig. 2 is constituted for multiple big files provided in an embodiment of the present invention.Such as , it is necessary to preset a metadatabase, the i.e. metadata in each logic storage unit comprising the big files of POSIX shown in Fig. 2 Storehouse is key value database RocksDB.

The realization of the present invention is not only restricted to any programming language and platform, can use Go in practice in Linux platform Language is as realizing language.

S102, the data content information of small documents to be stored is stored in big file, in the data of small documents to be stored Hold the end that information is added to big file successively；

S103, the metadata of small documents to be stored is stored in metadatabase.

Specifically, as shown in figure 3, metadata includes：The filename of small documents, the size of small documents, the type of small documents, The offset and self-defining metadata of the check value of small documents, the timestamp of small documents, small documents in big file.

Wherein, metadatabase is key value database RocksDB.

At the same time it can also by the part metadata redundant storage of small documents to be stored in big file, specifically, such as Fig. 4 It is shown, the data content information redundancy of the part metadata of small documents to be stored and small documents to be stored is stored in one big text In part, the part metadata of small documents to be stored includes：Offset, the filename of small documents and length of the small documents in big file The start bit of the data content information of space size, the check value of small documents and small documents shared by degree, the metadata of small documents Put.

The invention discloses a kind of small documents storage method, methods described includes：In the logic storage unit of big file Preset a metadatabase；The data content information of small documents to be stored is stored in big file, the small documents to be stored Data content information be added to big file end；The metadata of small documents to be stored is stored in metadatabase, first number According to including：Offset of the title, size, type, check value, timestamp, small documents of small documents in big file and make by oneself Adopted metadata.The present invention in the logic storage unit of big file by increasing a metadatabase, by small documents to be stored Data content information and the metadata of small documents to be stored are stored separately, realize individually can to the modification of metadata, Addition and deletion；In addition, when metadata be loaded into internal memory cached, it is not necessary to scan whole big file again.

The present invention also discloses corresponding system on the basis of method disclosed above.

Small documents storage system provided in an embodiment of the present invention is introduced below, it is necessary to explanation is, relevant this is small The explanation of document storage system can refer to small documents storage method provided above, not repeat below.

Accompanying drawing 5 is referred to, Fig. 5 is a kind of small documents memory system architecture schematic diagram provided in an embodiment of the present invention.Such as Fig. 5 Shown, the invention discloses a kind of small documents storage system, the system concrete structure includes as follows：

Default unit 501, for presetting a metadatabase in the logic storage unit of big file；

First memory cell 502, it is to be stored for the data content information of small documents to be stored to be stored in big file The data content information of small documents is added to the end of big file successively；

Second memory cell 503, for the metadata of small documents to be stored to be stored in metadatabase.

Wherein, metadata includes：The filename of small documents, the size of small documents, the type of small documents, the verification of small documents Value, the offset and self-defining metadata of the timestamp of small documents, the path of big file, small documents in big file.

Meanwhile, the first memory cell 502 can also be by the part metadata redundant storage of small documents to be stored in big file In, the part metadata of small documents to be stored is stored in one big file with the data content information of small documents to be stored, is treated The part metadata of storage small documents includes：Offset, the filename of small documents and length of the small documents in big file, small text The original position of the data content information of space size, the check value of small documents and small documents shared by the metadata of part.

The invention discloses a kind of small documents storage system, the system in the logic storage unit of big file by increasing One metadatabase, the metadata of the data content information of small documents to be stored and small documents to be stored is stored separately, Realizing to the modification of metadata, addition and can individually delete；In addition, when metadata is loaded into internal memory cached, Whole big file need not be scanned again.

A kind of small documents storage method and system are present embodiments provided, mainly can be appended to the small documents of write-in Big end of file, while remembeing the size and the original position in big file, i.e. offset of small documents.Secondly, can will be small The metadata of file, the timestamp, file type such as write-in, the offset added above is written to metadatabase together In.

To sum up, it is pre- in the logic storage unit of big file the invention discloses a kind of small documents storage method and system If a metadatabase；The data content information of small documents to be stored is stored in big file, the small documents to be stored Data content information is added to the end of big file；The metadata of small documents to be stored is stored in metadatabase, metadata Including：Offset of the title, size, type, check value, timestamp, small documents of small documents in big file and self-defined Metadata.The present invention in the logic storage unit of big file by increasing a metadatabase, by the number of small documents to be stored Be stored separately according to the metadata of content information and small documents to be stored, realize individually can to the modification of metadata, add Plus and delete；In addition, when metadata be loaded into internal memory cached, it is not necessary to scan whole big file again.

The present invention also discloses a kind of small documents on the basis of a kind of small documents storage method disclosed above and system Read method and system.

Accompanying drawing 6 is referred to, Fig. 6 is a kind of small documents read method schematic flow sheet provided in an embodiment of the present invention.Such as Fig. 6 Shown, the invention discloses a kind of small documents read method, this method specifically includes following steps：

S601, the filename for obtaining small documents；

S602, the member that small documents are searched according to the filenames of small documents in the metadatabase of the metadata of storage small documents Data message, metadata information includes：Size, the path in the big file in small documents place and the small documents of small documents are in big text Offset in part；

S603, read in the big file of the data content information of storage small documents according to the metadata informations of small documents it is small The data content information of file；

S604, the data content information of the metadata information of small documents and small documents returned into user.

A kind of small documents read method is present embodiments provided, is mainly found according to the name of small documents into metadatabase The metadata information of small documents, includes size, the path of the big file in place and the offset in big file of small documents, so The content of small documents can be just read according to these information afterwards, most the metadata of this partial data content information and small documents at last Information returns to user together.

Accompanying drawing 7 is referred to, Fig. 7 reads system structure diagram for a kind of small documents provided in an embodiment of the present invention.Such as Fig. 7 It is shown, system is read the invention discloses a kind of small documents, the system concrete structure includes as follows：

Acquiring unit 701, the filename for obtaining small documents；

Searching unit 702, is looked into for the filename according to small documents in the metadatabase of the metadata of storage small documents The metadata information of small documents is looked for, metadata information includes：Path in big file where the sizes of small documents, small documents and Offset of the small documents in big file；

Reading unit 703, the big of the data content information of small documents is being stored for the metadata information according to small documents The data content information of small documents is read in file；

Feedback unit 704, for the data content information of the metadata information of small documents and small documents to be returned into user.

Present embodiments provide a kind of small documents and read system, mainly found according to the name of small documents into metadatabase The metadata information of small documents, includes size, the path of the big file in place and the offset in big file of small documents, so The content of small documents can be just read according to these information afterwards, most the metadata of this partial data content information and small documents at last Information returns to user together.

The function that the present invention is realized needs big file and metadatabase to use cooperatively.In this programme, each big file Will one metadatabase of distribution.This metadatabase is a key value database, and being exactly in simple terms just can be with according to name Get content of the access in database.In file system, a file contains two-part content in fact, a part It is content-data, the content of such as one photo.Another is metadata, describes some other information of file.To shine Exemplified by piece file, the time, place such as photograph taking.In this programme, the content-data of small documents is stored in big file In, metadata is stored in database.That is, database only houses the metadata of small documents.Generally speaking, it is exactly greatly File is used for depositing content-data, and database is used for depositing the metadata and index information of small documents.

Merge the storage demand for being mainly used for solving mass small documents of storage, common small documents have picture, text Deng specifically, the storage available for picture and some UGC small videos, subtitle files etc..

In summary, the invention discloses a kind of small documents storage method, small documents read method and system, the small text Part storage method includes：A metadatabase is preset in the logic storage unit of big file；By the data of small documents to be stored Content information is stored in big file, and the data content information of the small documents to be stored is added to the end of big file；It will treat The metadata of storage small documents is stored in metadatabase, and metadata includes：The titles of small documents, size, type, check value, The offset and self-defining metadata of timestamp, small documents in big file.The present invention is stored by the logic in big file Increase a metadatabase in unit, the data content information of small documents to be stored and the metadata of small documents to be stored are carried out It is stored separately, realizing to the modification of metadata, addition and can individually delete；In addition, being deposited into metadata is loaded into During row caching, it is not necessary to scan whole big file again.

It should be noted that each embodiment in this specification is described by the way of progressive, each embodiment weight Point explanation be all between difference with other embodiment, each embodiment identical similar part mutually referring to.

It should also be noted that, herein, such as first and second or the like relational terms are used merely to one Entity or operation make a distinction with another entity or operation, and not necessarily require or imply between these entities or operation There is any this actual relation or order.Moreover, term " comprising ", "comprising" or its any other variant are intended to contain Lid nonexcludability is included, so that article or equipment including a series of key elements not only include those key elements, but also Including other key elements being not expressly set out, or also include for this article or the intrinsic key element of equipment.Do not having In the case of more limitations, the key element limited by sentence "including a ...", it is not excluded that including the article of above-mentioned key element Or also there is other identical element in equipment.

The foregoing description of the disclosed embodiments, enables professional and technical personnel in the field to realize or using the present invention. A variety of modifications to these embodiments will be apparent for those skilled in the art, as defined herein General Principle can be realized in other embodiments without departing from the spirit or scope of the present invention.Therefore, it is of the invention The embodiments shown herein is not intended to be limited to, and is to fit to and principles disclosed herein and features of novelty phase one The most wide scope caused.

Claims

1. a kind of small documents storage method, it is characterised in that including：

A metadatabase is preset in the logic storage unit of big file；

The data content information of small documents to be stored is stored in the big file, the data content of the small documents to be stored Information is added to the end of the big file successively；

The metadata of the small documents to be stored is stored in the metadatabase.

2. small documents storage method according to claim 1, it is characterised in that the metadata includes：The small documents Filename, the size of the small documents, the type of the small documents, the check value of the small documents, the small documents when Between stamp, offset and self-defining metadata of the small documents in the big file.

3. small documents storage method according to claim 1, it is characterised in that the metadatabase is key value database RocksDB。

4. small documents storage method according to claim 1, it is characterised in that also include：

By the part metadata redundant storage of the small documents to be stored in the big file, the portion of the small documents to be stored Point metadata and the data content information redundancy of the small documents to be stored are stored in one big file, the small text to be stored The part metadata of part includes：Offset, the filename of the small documents and length of the small documents in the big file, The data content information of space size, the check value of the small documents and the small documents shared by the metadata of the small documents Original position.

5. a kind of small documents storage system, it is characterised in that including：

First memory cell, it is described for the data content information of the small documents to be stored to be stored in the big file The data content information of small documents to be stored is added to the end of the big file successively；

6. small documents storage system according to claim 5, it is characterised in that the metadata includes：The small documents Filename, the size of the small documents, the type of the small documents, the check value of the small documents, the small documents when Between stamp, the offset and self-defining metadata of the path of the big file, the small documents in the big file.

7. small documents storage system according to claim 5, it is characterised in that the metadatabase is key value database RocksDB。

8. small documents storage system according to claim 5, it is characterised in that first memory cell, is additionally operable to：

9. a kind of small documents read method, it is characterised in that including：

Obtain the filename of the small documents；

The small documents are searched in the metadatabase of metadata of the small documents is stored according to the filename of the small documents Metadata information, the metadata information includes：Path in big file where the size of the small documents, the small documents And offset of the small documents in the big file；

Read according to the metadata information of the small documents in the big file of data content information of the small documents is stored Take the data content information of the small documents；

10. a kind of small documents read system, it is characterised in that including：

Acquiring unit, the filename for obtaining the small documents；

Searching unit, is looked into for the filename according to the small documents in the metadatabase of metadata of the small documents is stored The metadata information of the small documents is looked for, the metadata information includes：It is big where the size of the small documents, the small documents The offset of path and the small documents in the big file in file；

Reading unit, for the metadata information according to the small documents the data content information for storing the small documents institute State the data content information that the small documents are read in big file；

Feedback unit, for the data content information of the metadata information of the small documents and the small documents to be returned into use Family.