CN106250494A - A kind of data management and analysis system based on file system - Google Patents
A kind of data management and analysis system based on file system Download PDFInfo
- Publication number
- CN106250494A CN106250494A CN201610623825.8A CN201610623825A CN106250494A CN 106250494 A CN106250494 A CN 106250494A CN 201610623825 A CN201610623825 A CN 201610623825A CN 106250494 A CN106250494 A CN 106250494A
- Authority
- CN
- China
- Prior art keywords
- data characteristics
- data
- file system
- storehouse
- management
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/14—Details of searching files based on file metadata
- G06F16/148—File search processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/17—Details of further file system functions
- G06F16/178—Techniques for file synchronisation in file systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/1805—Append-only file systems, e.g. using logs or journals to store data
- G06F16/1815—Journaling file systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/182—Distributed file systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
Abstract
The present invention discloses a kind of data management and analysis system based on file system, including: it is provided with the daily record subsystem of the file system of client-side interface;The data characteristics catcher of the outer work of band, reads journal entries by client-side interface from daily record subsystem, extracts data characteristics and change thereof from the journal entries read;Data characteristics storehouse adapter, require to change into data characteristics and change thereof retrieval entry the storehouse type in data characteristics storehouse that the outer work of band is set and library structure according to concrete data characteristics analysis, then retrieval entry is reset (replay) in data characteristics storehouse;Data characteristics management analysis subsystem, analyzes requirement according to concrete data characteristics, arranges search condition, the data characteristics in organization and administration and analytical data feature database.The storehouse type of the demand adaptation data feature database that the present invention can apply according to data characteristics management and analysis neatly and library structure are without the realization revising file system according to the demand analyzed, manage frequently.
Description
Technical field
The present invention relates to field of computer technology.More particularly, to a kind of data management analysis based on file system
System.
Background technology
The file system of computer is that computer user provides name space and address space, thus enables user's storage
While mass data, organize data according to file name, path and catalogue and find data.Constantly expanding of data brings
The demand of complex data management, relies solely on this organizational form of file name, path and catalogue and cannot meet user
The demand of data management.In recent years, substantial amounts of market demand and scientific algorithm need complicated data tissue and data to send out
Existing mechanism, thus expedited the emergence of the birth of data management system.Current data management system, it is necessary first to obtain file system number
According to feature and change to a relational database, and then define rule according to the data characteristics in relational database, count
Finding and tissue according to management, data, wherein the data characteristics of file system is also metadata.
The usual data characteristics and the change thereof that use two ways to obtain file system in prior art:
First kind of way: obtain data characteristics by scanning file system, and periodic scan comparison file system is poor
Different find that data characteristics changes, aggregated data feature and change to, in data base, then do number according to data characteristics
According to management.This mode has certain defect, and first, periodic scan have lost the real-time that data characteristics updates, secondly, greatly
File system scanning and comparison very time-consuming, inefficiency.
The second way: the data characteristics of separate file system and data, the data characteristics subsystem of file system is set
Count into a data base, all of file system data characteristic manipulation, the inherently operation to this data base, all of data
Feature is all saved in data base, and then convenient search and inquiry.This for data management the metadata of file system
Server is implemented as data characteristics storehouse mode and belongs to (In Band) data management system in band, and the defect of this mode is, literary composition
The change of metadata that the normal IO of part system causes is also required to more new metadata, and file system cannot self adaptation and dynamically adjust Coorg
Formula.Because the data characteristics subsystem of file system once defines data characteristics layout, storehouse type and library structure (schema)
Just cannot change and achieve, which close coupling designs, and data characteristics storehouse is a part for file system, and institute is the most non-
The most dumb, it is impossible to the storehouse type adaptive at any time according to the target of data characteristics management and the demand of analysis and library structure.Simultaneously
The systematic function of data characteristics operation is completely dependent on and is limited to the performance of data characteristics subsystem database frequently.
Accordingly, it is desirable to provide a kind of data management and analysis system based on file system.
Summary of the invention
It is an object of the invention to provide a kind of data management and analysis system based on file system, file system can not be changed
The realization of system and the storehouse type of demand adaptation data feature database of according to data characteristics management and analyzing application neatly and storehouse knot
Structure.
For reaching above-mentioned purpose, the present invention uses following technical proposals:
A kind of data management and analysis system based on file system, including: the daily record subsystem of file system, data characteristics
Catcher, data characteristics storehouse adapter, data characteristics storehouse and data Features Management analyzing subsystem;
The daily record subsystem of described file system is provided with client-side interface;
Described data characteristics catcher reads daily record by described client-side interface from the daily record subsystem of file system
Entry, extracts data characteristics and change thereof from the journal entries read;
Described data characteristics storehouse adapter requires described data characteristics and change thereof according to concrete data characteristics analysis
Change into retrieval entry and require to arrange storehouse type and the library structure in described data characteristics storehouse according to concrete data characteristics analysis,
Then described retrieval entry is reset in data characteristics storehouse;
Described data characteristics management analysis subsystem is according to concrete data characteristics management or analyzes requirement, arranges retrieval bar
Part, the data characteristics in organization and administration and analytical data feature database;
Described data characteristics catcher and described data characteristics storehouse are all the outer work of band.
Preferably, the daily record take-back strategy of the daily record subsystem of described file system is: only when file system applies
Data characteristics operation after and data characteristics catcher explicitly allow reclaim journal entries just can sequentially be reclaimed.
Preferably, described data characteristics catcher by described client-side interface from the daily record subsystem of file system
Update current log vernier when reading journal entries the most simultaneously.
Preferably, the type in described data characteristics storehouse includes RDBMS relational database, distributed NOSQL data base, search
Engine or relevant retrieval, search system.
In order to obtain data characteristics and the change of real-time tracking data feature, it is to avoid scan big file system (deep catalogue layer
Secondary, mass file number), the present invention utilizes daily record subsystem real-time capture data characteristics and the change thereof of file system, and
Data characteristics and change thereof are pooled in data characteristics storehouse.
In order to ensure that the present invention is enough flexible, the storehouse type in data characteristics storehouse and library structure (schema) require with decoupling
File system data feature layout realizes, and can easily adjust according to data management and the demand of analysis application flexibly, with
Time do not affect the performance of file system itself.The present invention can allow not change the realization of file system and special according to data flexibly
The storehouse type of the demand adaptation data feature database of expropriation and management reason and analysis application and library structure.
Beneficial effects of the present invention is as follows:
(1) present invention does not affect the IO performance of file system, data characteristics catcher and data feature database is all outside band
(Out Of Band) works, and itself does not affect normal input and output code path and the performance of input and output of file system.
(2) all of file system possessing daily record subsystem can be transformed into applicable data management according to the present invention
Analysis system, therefore the suitability of the present invention is wide.
(3) present invention catches data characteristics and change thereof according to journal entries, can accomplish real-time embodying data characteristics
Update, and easily obtain the increment of data characteristics change, make the data characteristics in file system and the number in data feature database
Keep consistent according to feature.
(4) present invention is according to the real needs of management analysis, flexibly the storehouse type of adaptation data feature database and library structure
(schema) change, and without file system realized.The inquiry that can be required by the adaptive various different application in data characteristics storehouse,
Retrieval and search.
Accompanying drawing explanation
Below in conjunction with the accompanying drawings the detailed description of the invention of the present invention is described in further detail;
Fig. 1 is illustrated based on the schematic diagram of the data management and analysis system of file system.
Detailed description of the invention
In order to be illustrated more clearly that the present invention, below in conjunction with preferred embodiments and drawings, the present invention is done further
Bright.Parts similar in accompanying drawing are indicated with identical reference.It will be appreciated by those skilled in the art that institute is concrete below
The content described is illustrative and be not restrictive, and should not limit the scope of the invention with this.
About the daily record subsystem of file system, a lot of existing file system are all in order to ensure data and data characteristics
Concordance, all achieves daily record subsystem.The daily record subsystem of file system be otherwise known as WAL write front daily record or attempt daily record
Intent Log.Each file system updates being changed of the file system data feature involved by operation, can be first with day
The mode persistence of will adds in file system journal, then reapplies in file system.When the renewal of file system is grasped
When completing, when i.e. file system has applied data characteristics operation, change relevant journal entries in this and just can be returned
Receive.
All of possess write front daily record, attempt the local file system of daily record subsystem, distributed file system can
Transform according to the data management and analysis system of the file system of the present embodiment offer and be incorporated into the file that the present embodiment provides
The data management and analysis system of system.
The data that the data management and analysis system based on file system that the present embodiment provides carries out data management analysis are special
Levy and include: the standard attribute (POSIX attribute ATTR) of file and extended attribute (XATTR).
The data management and analysis system based on file system that the present embodiment provides, daily record subsystem based on file system
(Filesystem Journaling subsystem) obtains data characteristics and change, aggregated data feature into base, carries out base
Management and analysis in data characteristics.
As it is shown in figure 1, the data management and analysis system based on file system that the present embodiment provides includes: file system
Daily record subsystem, data characteristics catcher, data characteristics storehouse adapter, data characteristics storehouse and data Features Management analyze subsystem
System;
The daily record subsystem of file system: file system journal subsystem is provided with client-side interface, this client-side interface
Function be: sequentially read journal entries for data characteristics catcher, update current log vernier and the explicit day allowing to reclaim
Will entry;Journal entries embodies file system data feature and data changing features.Due to existing file system journal
System can reclaim journal entries after data characteristics updates in file system, in order to ensure that number do not omitted by data characteristics catcher
Updating according to feature, in the present embodiment, the daily record take-back strategy of the daily record subsystem of file system is adjusted to: not by data characteristics
Catcher (client of daily record subsystem) explicitly allows the journal entries reclaimed to reclaim, and only applies when file system
After data characteristics operation and the client of daily record subsystem explicitly allows the journal entries reclaimed just can sequentially to be reclaimed.
Data characteristics catcher: data characteristics catcher is the outer work of band, and data characteristics catcher is as daily record subsystem
The client of system, actively passes through client-side interface and reads journal entries from the daily record subsystem of file system, from the day read
Will entry is extracted data characteristics and change thereof, updates current log vernier and the data characteristics extracted and change thereof are sent out
Deliver to data characteristics storehouse adapter.
Data characteristics storehouse: data characteristics storehouse is the outer work of band, and the data characteristics catcher outside file system catches number
According to feature and change thereof, data characteristics storehouse is according to the adaptive various storehouses type of adapter and library structure, therefore data characteristics Ku Kegen
According to concrete data characteristics analysis require for the difference of target file system and different, the type in data characteristics storehouse includes
RDBMS relational database, distributed NOSQL data base, search engine or relevant retrieval, search system.
Data characteristics storehouse adapter: owing to data characteristics storehouse can arrange different storehouses according to the difference of file system application
Type and library structure, therefore data characteristics storehouse adapter needs to require to catch data characteristics according to concrete data characteristics analysis
Data characteristics and change thereof that device extracts change into corresponding retrieval entry and require to arrange according to concrete data characteristics analysis
The storehouse type in data characteristics storehouse and library structure, then reset (replay) to data by retrieval entry corresponding for these journal entries
In feature database.
Data characteristics management analysis subsystem: analyze requirement according to concrete data characteristics, search condition, tub of tissue are set
Data characteristics in reason and analytical data feature database, to reach data characteristics management and the purpose of data feature analysis, above-mentioned group
Knit the data characteristics in management data characteristics storehouse to include scanning for according to data characteristics, retrieve, classify, set strategy and triggering
The action performed after condition, and trigger condition triggering.
Substitute into two concrete data characteristics storehouses data management analysis system to the file system that the present embodiment provides below
System is further described.
File system selects CEPHFS to be example, it is not limited to CEPHFS.Improve the file system journal of CEPHFS
Subsystem.This daily record subsystem compared with existing daily record subsystem, being improved to of this daily record subsystem: 1. be provided with client
Interface, it is provided that client order reads journal entries, updates current log and reads vernier;Certain day is reclaimed according to client requirements
All entries before will entry, update the function reclaiming vernier.2. adjust journal entries take-back strategy, only work as file system
After applying data characteristics operation and the client of daily record subsystem explicit reclaimed correlation log entry, file system just may be used
With this journal entries of real recovery.
The type in data characteristics storehouse is RDBMS PostgreSQL data base.And according to file system standard file attribute
ATTR (size of file creates renewal time, directory size, the owner etc.) and extended attribute XATTR manages tissue number
According to.
Data characteristics catcher is as the client of daily record subsystem, and order reads corresponding journal entries and from reading
Journal entries is extracted data characteristics and change thereof.
Data characteristics and change thereof are changed into and retrieve entry accordingly by data characteristics storehouse adapter, and according to storehouse type
PostgreSQL data characteristics storehouse and predefined list structure, (Replay) retrieval entry of resetting is to PostgreSQL data characteristics
In storehouse.
Data characteristics management analysis subsystem, according to the content in PostgreSQL data characteristics storehouse, sets querying condition and does
The organization and management of data: such as pick out the file that size is maximum, searches the All Files of certain time period renewal, and tool
The All Files of certain identical extended attribute value standby.
Data characteristics storehouse can also be search engine ElasticSearch, and inquiry possesses the literary composition that extended attribute content is ABC
Part.In search All Files, probability that extended attribute ABC and DEF occurs simultaneously and file.
Obviously, the above embodiment of the present invention is only for clearly demonstrating example of the present invention, and is not right
The restriction of embodiments of the present invention, for those of ordinary skill in the field, the most also may be used
To make other changes in different forms, cannot all of embodiment be given exhaustive here, every belong to this
What bright technical scheme was extended out obviously changes or changes the row still in protection scope of the present invention.
Claims (4)
1. a data management and analysis system based on file system, it is characterised in that this system includes: the daily record of file system
Subsystem, data characteristics catcher, data characteristics storehouse adapter, data characteristics storehouse and data Features Management analyzing subsystem;
The daily record subsystem of described file system is provided with client-side interface;
Described data characteristics catcher reads journal entries by described client-side interface from the daily record subsystem of file system,
Data characteristics and change thereof is extracted from the journal entries read;
Described data characteristics storehouse adapter requires to convert described data characteristics and change thereof according to concrete data characteristics analysis
Become retrieval entry and require to arrange storehouse type and the library structure in described data characteristics storehouse according to concrete data characteristics analysis, then
Described retrieval entry is reset in data characteristics storehouse;
Described data characteristics management analysis subsystem is according to concrete data characteristics management or analyzes requirement, arranges search condition,
Data characteristics in organization and administration and analytical data feature database;
Described data characteristics catcher and described data characteristics storehouse are all the outer work of band.
Data management and analysis system based on file system the most according to claim 1, it is characterised in that described file system
The daily record take-back strategy of the daily record subsystem of system is: only when file system applies after data characteristics operates and data characteristics is caught
Catching device explicitly allows the journal entries reclaimed just can sequentially to be reclaimed.
Data management and analysis system based on file system the most according to claim 1, it is characterised in that described data are special
Levy catcher to update when reading journal entries from the daily record subsystem of file system by described client-side interface the most simultaneously
Current log vernier.
Data management and analysis system based on file system the most according to claim 1, it is characterised in that described data are special
The type levying storehouse includes RDBMS relational database, distributed NOSQL data base, search engine or relevant retrieval, search system
System.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610623825.8A CN106250494B (en) | 2016-08-02 | 2016-08-02 | A kind of data management and analysis system based on file system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610623825.8A CN106250494B (en) | 2016-08-02 | 2016-08-02 | A kind of data management and analysis system based on file system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106250494A true CN106250494A (en) | 2016-12-21 |
CN106250494B CN106250494B (en) | 2019-04-09 |
Family
ID=57606374
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610623825.8A Active CN106250494B (en) | 2016-08-02 | 2016-08-02 | A kind of data management and analysis system based on file system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106250494B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110297846A (en) * | 2019-05-28 | 2019-10-01 | 北京奇艺世纪科技有限公司 | A kind of log feature processing system, method, electronic equipment and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6725392B1 (en) * | 1999-03-03 | 2004-04-20 | Adaptec, Inc. | Controller fault recovery system for a distributed file system |
CN1893370A (en) * | 2005-06-29 | 2007-01-10 | 国际商业机器公司 | Server cluster recovery and maintenance method and system |
CN101304360A (en) * | 2007-05-08 | 2008-11-12 | 艾岩 | System and method for virtualization of user digital terminal |
CN101578599A (en) * | 2006-08-07 | 2009-11-11 | 米谋萨系统有限公司 | Synthesis of fatty acids |
CN103533023A (en) * | 2013-07-25 | 2014-01-22 | 上海和辰信息技术有限公司 | Cloud service application cluster synchronization system and synchronization method based on cloud service characteristics |
-
2016
- 2016-08-02 CN CN201610623825.8A patent/CN106250494B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6725392B1 (en) * | 1999-03-03 | 2004-04-20 | Adaptec, Inc. | Controller fault recovery system for a distributed file system |
CN1893370A (en) * | 2005-06-29 | 2007-01-10 | 国际商业机器公司 | Server cluster recovery and maintenance method and system |
CN101578599A (en) * | 2006-08-07 | 2009-11-11 | 米谋萨系统有限公司 | Synthesis of fatty acids |
CN101304360A (en) * | 2007-05-08 | 2008-11-12 | 艾岩 | System and method for virtualization of user digital terminal |
CN103533023A (en) * | 2013-07-25 | 2014-01-22 | 上海和辰信息技术有限公司 | Cloud service application cluster synchronization system and synchronization method based on cloud service characteristics |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110297846A (en) * | 2019-05-28 | 2019-10-01 | 北京奇艺世纪科技有限公司 | A kind of log feature processing system, method, electronic equipment and storage medium |
CN110297846B (en) * | 2019-05-28 | 2021-08-20 | 北京奇艺世纪科技有限公司 | Log feature processing system, method, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN106250494B (en) | 2019-04-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11775524B2 (en) | Cache for efficient record lookups in an LSM data structure | |
CN107491523B (en) | Method and device for storing data object | |
US8924373B2 (en) | Query plans with parameter markers in place of object identifiers | |
US20170147628A1 (en) | Transactional cache invalidation for inter-node caching | |
US10417265B2 (en) | High performance parallel indexing for forensics and electronic discovery | |
US8478797B2 (en) | Atomic deletion of database data categories | |
US10430403B2 (en) | Tracking change data in a database | |
CN106462592A (en) | Systems and methods to optimize multi-version support in indexes | |
Zhang et al. | Ranking uncertain sky: The probabilistic top-k skyline operator | |
CN101510209A (en) | Method, system and server for implementing real time search | |
EP2336901B1 (en) | Online access to database snapshots | |
US10893067B1 (en) | Systems and methods for rapidly generating security ratings | |
US9262511B2 (en) | System and method for indexing streams containing unstructured text data | |
US20110289112A1 (en) | Database system, database management method, database structure, and storage medium | |
CN102819586A (en) | Uniform Resource Locator (URL) classifying method and equipment based on cache | |
CN107169003B (en) | Data association method and device | |
CN110637292A (en) | System and method for querying a resource cache | |
US20220156260A1 (en) | Columnar Techniques for Big Metadata Management | |
CN101459599B (en) | Method and system for implementing concurrent execution of cache data access and loading | |
US20180203908A1 (en) | Distributed database system and distributed data processing method | |
CN106250494A (en) | A kind of data management and analysis system based on file system | |
US10019483B2 (en) | Search system and search method | |
US20210141763A1 (en) | Systems and Methods for Large Scale Complex Storage Operation Execution | |
KR102415155B1 (en) | Apparatus and method for retrieving data | |
US8495025B2 (en) | Foldering by stable query |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |