CN106250494A - A kind of data management and analysis system based on file system - Google Patents

A kind of data management and analysis system based on file system Download PDF

Info

Publication number
CN106250494A
CN106250494A CN201610623825.8A CN201610623825A CN106250494A CN 106250494 A CN106250494 A CN 106250494A CN 201610623825 A CN201610623825 A CN 201610623825A CN 106250494 A CN106250494 A CN 106250494A
Authority
CN
China
Prior art keywords
data characteristics
data
file system
storehouse
management
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610623825.8A
Other languages
Chinese (zh)
Other versions
CN106250494B (en
Inventor
吴江
谢鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Polar Technology (beijing) Co Ltd
Original Assignee
Polar Technology (beijing) Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Polar Technology (beijing) Co Ltd filed Critical Polar Technology (beijing) Co Ltd
Priority to CN201610623825.8A priority Critical patent/CN106250494B/en
Publication of CN106250494A publication Critical patent/CN106250494A/en
Application granted granted Critical
Publication of CN106250494B publication Critical patent/CN106250494B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/14Details of searching files based on file metadata
    • G06F16/148File search processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/178Techniques for file synchronisation in file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/1805Append-only file systems, e.g. using logs or journals to store data
    • G06F16/1815Journaling file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases

Abstract

The present invention discloses a kind of data management and analysis system based on file system, including: it is provided with the daily record subsystem of the file system of client-side interface;The data characteristics catcher of the outer work of band, reads journal entries by client-side interface from daily record subsystem, extracts data characteristics and change thereof from the journal entries read;Data characteristics storehouse adapter, require to change into data characteristics and change thereof retrieval entry the storehouse type in data characteristics storehouse that the outer work of band is set and library structure according to concrete data characteristics analysis, then retrieval entry is reset (replay) in data characteristics storehouse;Data characteristics management analysis subsystem, analyzes requirement according to concrete data characteristics, arranges search condition, the data characteristics in organization and administration and analytical data feature database.The storehouse type of the demand adaptation data feature database that the present invention can apply according to data characteristics management and analysis neatly and library structure are without the realization revising file system according to the demand analyzed, manage frequently.

Description

A kind of data management and analysis system based on file system
Technical field
The present invention relates to field of computer technology.More particularly, to a kind of data management analysis based on file system System.
Background technology
The file system of computer is that computer user provides name space and address space, thus enables user's storage While mass data, organize data according to file name, path and catalogue and find data.Constantly expanding of data brings The demand of complex data management, relies solely on this organizational form of file name, path and catalogue and cannot meet user The demand of data management.In recent years, substantial amounts of market demand and scientific algorithm need complicated data tissue and data to send out Existing mechanism, thus expedited the emergence of the birth of data management system.Current data management system, it is necessary first to obtain file system number According to feature and change to a relational database, and then define rule according to the data characteristics in relational database, count Finding and tissue according to management, data, wherein the data characteristics of file system is also metadata.
The usual data characteristics and the change thereof that use two ways to obtain file system in prior art:
First kind of way: obtain data characteristics by scanning file system, and periodic scan comparison file system is poor Different find that data characteristics changes, aggregated data feature and change to, in data base, then do number according to data characteristics According to management.This mode has certain defect, and first, periodic scan have lost the real-time that data characteristics updates, secondly, greatly File system scanning and comparison very time-consuming, inefficiency.
The second way: the data characteristics of separate file system and data, the data characteristics subsystem of file system is set Count into a data base, all of file system data characteristic manipulation, the inherently operation to this data base, all of data Feature is all saved in data base, and then convenient search and inquiry.This for data management the metadata of file system Server is implemented as data characteristics storehouse mode and belongs to (In Band) data management system in band, and the defect of this mode is, literary composition The change of metadata that the normal IO of part system causes is also required to more new metadata, and file system cannot self adaptation and dynamically adjust Coorg Formula.Because the data characteristics subsystem of file system once defines data characteristics layout, storehouse type and library structure (schema) Just cannot change and achieve, which close coupling designs, and data characteristics storehouse is a part for file system, and institute is the most non- The most dumb, it is impossible to the storehouse type adaptive at any time according to the target of data characteristics management and the demand of analysis and library structure.Simultaneously The systematic function of data characteristics operation is completely dependent on and is limited to the performance of data characteristics subsystem database frequently.
Accordingly, it is desirable to provide a kind of data management and analysis system based on file system.
Summary of the invention
It is an object of the invention to provide a kind of data management and analysis system based on file system, file system can not be changed The realization of system and the storehouse type of demand adaptation data feature database of according to data characteristics management and analyzing application neatly and storehouse knot Structure.
For reaching above-mentioned purpose, the present invention uses following technical proposals:
A kind of data management and analysis system based on file system, including: the daily record subsystem of file system, data characteristics Catcher, data characteristics storehouse adapter, data characteristics storehouse and data Features Management analyzing subsystem;
The daily record subsystem of described file system is provided with client-side interface;
Described data characteristics catcher reads daily record by described client-side interface from the daily record subsystem of file system Entry, extracts data characteristics and change thereof from the journal entries read;
Described data characteristics storehouse adapter requires described data characteristics and change thereof according to concrete data characteristics analysis Change into retrieval entry and require to arrange storehouse type and the library structure in described data characteristics storehouse according to concrete data characteristics analysis, Then described retrieval entry is reset in data characteristics storehouse;
Described data characteristics management analysis subsystem is according to concrete data characteristics management or analyzes requirement, arranges retrieval bar Part, the data characteristics in organization and administration and analytical data feature database;
Described data characteristics catcher and described data characteristics storehouse are all the outer work of band.
Preferably, the daily record take-back strategy of the daily record subsystem of described file system is: only when file system applies Data characteristics operation after and data characteristics catcher explicitly allow reclaim journal entries just can sequentially be reclaimed.
Preferably, described data characteristics catcher by described client-side interface from the daily record subsystem of file system Update current log vernier when reading journal entries the most simultaneously.
Preferably, the type in described data characteristics storehouse includes RDBMS relational database, distributed NOSQL data base, search Engine or relevant retrieval, search system.
In order to obtain data characteristics and the change of real-time tracking data feature, it is to avoid scan big file system (deep catalogue layer Secondary, mass file number), the present invention utilizes daily record subsystem real-time capture data characteristics and the change thereof of file system, and Data characteristics and change thereof are pooled in data characteristics storehouse.
In order to ensure that the present invention is enough flexible, the storehouse type in data characteristics storehouse and library structure (schema) require with decoupling File system data feature layout realizes, and can easily adjust according to data management and the demand of analysis application flexibly, with Time do not affect the performance of file system itself.The present invention can allow not change the realization of file system and special according to data flexibly The storehouse type of the demand adaptation data feature database of expropriation and management reason and analysis application and library structure.
Beneficial effects of the present invention is as follows:
(1) present invention does not affect the IO performance of file system, data characteristics catcher and data feature database is all outside band (Out Of Band) works, and itself does not affect normal input and output code path and the performance of input and output of file system.
(2) all of file system possessing daily record subsystem can be transformed into applicable data management according to the present invention Analysis system, therefore the suitability of the present invention is wide.
(3) present invention catches data characteristics and change thereof according to journal entries, can accomplish real-time embodying data characteristics Update, and easily obtain the increment of data characteristics change, make the data characteristics in file system and the number in data feature database Keep consistent according to feature.
(4) present invention is according to the real needs of management analysis, flexibly the storehouse type of adaptation data feature database and library structure (schema) change, and without file system realized.The inquiry that can be required by the adaptive various different application in data characteristics storehouse, Retrieval and search.
Accompanying drawing explanation
Below in conjunction with the accompanying drawings the detailed description of the invention of the present invention is described in further detail;
Fig. 1 is illustrated based on the schematic diagram of the data management and analysis system of file system.
Detailed description of the invention
In order to be illustrated more clearly that the present invention, below in conjunction with preferred embodiments and drawings, the present invention is done further Bright.Parts similar in accompanying drawing are indicated with identical reference.It will be appreciated by those skilled in the art that institute is concrete below The content described is illustrative and be not restrictive, and should not limit the scope of the invention with this.
About the daily record subsystem of file system, a lot of existing file system are all in order to ensure data and data characteristics Concordance, all achieves daily record subsystem.The daily record subsystem of file system be otherwise known as WAL write front daily record or attempt daily record Intent Log.Each file system updates being changed of the file system data feature involved by operation, can be first with day The mode persistence of will adds in file system journal, then reapplies in file system.When the renewal of file system is grasped When completing, when i.e. file system has applied data characteristics operation, change relevant journal entries in this and just can be returned Receive.
All of possess write front daily record, attempt the local file system of daily record subsystem, distributed file system can Transform according to the data management and analysis system of the file system of the present embodiment offer and be incorporated into the file that the present embodiment provides The data management and analysis system of system.
The data that the data management and analysis system based on file system that the present embodiment provides carries out data management analysis are special Levy and include: the standard attribute (POSIX attribute ATTR) of file and extended attribute (XATTR).
The data management and analysis system based on file system that the present embodiment provides, daily record subsystem based on file system (Filesystem Journaling subsystem) obtains data characteristics and change, aggregated data feature into base, carries out base Management and analysis in data characteristics.
As it is shown in figure 1, the data management and analysis system based on file system that the present embodiment provides includes: file system Daily record subsystem, data characteristics catcher, data characteristics storehouse adapter, data characteristics storehouse and data Features Management analyze subsystem System;
The daily record subsystem of file system: file system journal subsystem is provided with client-side interface, this client-side interface Function be: sequentially read journal entries for data characteristics catcher, update current log vernier and the explicit day allowing to reclaim Will entry;Journal entries embodies file system data feature and data changing features.Due to existing file system journal System can reclaim journal entries after data characteristics updates in file system, in order to ensure that number do not omitted by data characteristics catcher Updating according to feature, in the present embodiment, the daily record take-back strategy of the daily record subsystem of file system is adjusted to: not by data characteristics Catcher (client of daily record subsystem) explicitly allows the journal entries reclaimed to reclaim, and only applies when file system After data characteristics operation and the client of daily record subsystem explicitly allows the journal entries reclaimed just can sequentially to be reclaimed.
Data characteristics catcher: data characteristics catcher is the outer work of band, and data characteristics catcher is as daily record subsystem The client of system, actively passes through client-side interface and reads journal entries from the daily record subsystem of file system, from the day read Will entry is extracted data characteristics and change thereof, updates current log vernier and the data characteristics extracted and change thereof are sent out Deliver to data characteristics storehouse adapter.
Data characteristics storehouse: data characteristics storehouse is the outer work of band, and the data characteristics catcher outside file system catches number According to feature and change thereof, data characteristics storehouse is according to the adaptive various storehouses type of adapter and library structure, therefore data characteristics Ku Kegen According to concrete data characteristics analysis require for the difference of target file system and different, the type in data characteristics storehouse includes RDBMS relational database, distributed NOSQL data base, search engine or relevant retrieval, search system.
Data characteristics storehouse adapter: owing to data characteristics storehouse can arrange different storehouses according to the difference of file system application Type and library structure, therefore data characteristics storehouse adapter needs to require to catch data characteristics according to concrete data characteristics analysis Data characteristics and change thereof that device extracts change into corresponding retrieval entry and require to arrange according to concrete data characteristics analysis The storehouse type in data characteristics storehouse and library structure, then reset (replay) to data by retrieval entry corresponding for these journal entries In feature database.
Data characteristics management analysis subsystem: analyze requirement according to concrete data characteristics, search condition, tub of tissue are set Data characteristics in reason and analytical data feature database, to reach data characteristics management and the purpose of data feature analysis, above-mentioned group Knit the data characteristics in management data characteristics storehouse to include scanning for according to data characteristics, retrieve, classify, set strategy and triggering The action performed after condition, and trigger condition triggering.
Substitute into two concrete data characteristics storehouses data management analysis system to the file system that the present embodiment provides below System is further described.
File system selects CEPHFS to be example, it is not limited to CEPHFS.Improve the file system journal of CEPHFS Subsystem.This daily record subsystem compared with existing daily record subsystem, being improved to of this daily record subsystem: 1. be provided with client Interface, it is provided that client order reads journal entries, updates current log and reads vernier;Certain day is reclaimed according to client requirements All entries before will entry, update the function reclaiming vernier.2. adjust journal entries take-back strategy, only work as file system After applying data characteristics operation and the client of daily record subsystem explicit reclaimed correlation log entry, file system just may be used With this journal entries of real recovery.
The type in data characteristics storehouse is RDBMS PostgreSQL data base.And according to file system standard file attribute ATTR (size of file creates renewal time, directory size, the owner etc.) and extended attribute XATTR manages tissue number According to.
Data characteristics catcher is as the client of daily record subsystem, and order reads corresponding journal entries and from reading Journal entries is extracted data characteristics and change thereof.
Data characteristics and change thereof are changed into and retrieve entry accordingly by data characteristics storehouse adapter, and according to storehouse type PostgreSQL data characteristics storehouse and predefined list structure, (Replay) retrieval entry of resetting is to PostgreSQL data characteristics In storehouse.
Data characteristics management analysis subsystem, according to the content in PostgreSQL data characteristics storehouse, sets querying condition and does The organization and management of data: such as pick out the file that size is maximum, searches the All Files of certain time period renewal, and tool The All Files of certain identical extended attribute value standby.
Data characteristics storehouse can also be search engine ElasticSearch, and inquiry possesses the literary composition that extended attribute content is ABC Part.In search All Files, probability that extended attribute ABC and DEF occurs simultaneously and file.
Obviously, the above embodiment of the present invention is only for clearly demonstrating example of the present invention, and is not right The restriction of embodiments of the present invention, for those of ordinary skill in the field, the most also may be used To make other changes in different forms, cannot all of embodiment be given exhaustive here, every belong to this What bright technical scheme was extended out obviously changes or changes the row still in protection scope of the present invention.

Claims (4)

1. a data management and analysis system based on file system, it is characterised in that this system includes: the daily record of file system Subsystem, data characteristics catcher, data characteristics storehouse adapter, data characteristics storehouse and data Features Management analyzing subsystem;
The daily record subsystem of described file system is provided with client-side interface;
Described data characteristics catcher reads journal entries by described client-side interface from the daily record subsystem of file system, Data characteristics and change thereof is extracted from the journal entries read;
Described data characteristics storehouse adapter requires to convert described data characteristics and change thereof according to concrete data characteristics analysis Become retrieval entry and require to arrange storehouse type and the library structure in described data characteristics storehouse according to concrete data characteristics analysis, then Described retrieval entry is reset in data characteristics storehouse;
Described data characteristics management analysis subsystem is according to concrete data characteristics management or analyzes requirement, arranges search condition, Data characteristics in organization and administration and analytical data feature database;
Described data characteristics catcher and described data characteristics storehouse are all the outer work of band.
Data management and analysis system based on file system the most according to claim 1, it is characterised in that described file system The daily record take-back strategy of the daily record subsystem of system is: only when file system applies after data characteristics operates and data characteristics is caught Catching device explicitly allows the journal entries reclaimed just can sequentially to be reclaimed.
Data management and analysis system based on file system the most according to claim 1, it is characterised in that described data are special Levy catcher to update when reading journal entries from the daily record subsystem of file system by described client-side interface the most simultaneously Current log vernier.
Data management and analysis system based on file system the most according to claim 1, it is characterised in that described data are special The type levying storehouse includes RDBMS relational database, distributed NOSQL data base, search engine or relevant retrieval, search system System.
CN201610623825.8A 2016-08-02 2016-08-02 A kind of data management and analysis system based on file system Active CN106250494B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610623825.8A CN106250494B (en) 2016-08-02 2016-08-02 A kind of data management and analysis system based on file system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610623825.8A CN106250494B (en) 2016-08-02 2016-08-02 A kind of data management and analysis system based on file system

Publications (2)

Publication Number Publication Date
CN106250494A true CN106250494A (en) 2016-12-21
CN106250494B CN106250494B (en) 2019-04-09

Family

ID=57606374

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610623825.8A Active CN106250494B (en) 2016-08-02 2016-08-02 A kind of data management and analysis system based on file system

Country Status (1)

Country Link
CN (1) CN106250494B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110297846A (en) * 2019-05-28 2019-10-01 北京奇艺世纪科技有限公司 A kind of log feature processing system, method, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6725392B1 (en) * 1999-03-03 2004-04-20 Adaptec, Inc. Controller fault recovery system for a distributed file system
CN1893370A (en) * 2005-06-29 2007-01-10 国际商业机器公司 Server cluster recovery and maintenance method and system
CN101304360A (en) * 2007-05-08 2008-11-12 艾岩 System and method for virtualization of user digital terminal
CN101578599A (en) * 2006-08-07 2009-11-11 米谋萨系统有限公司 Synthesis of fatty acids
CN103533023A (en) * 2013-07-25 2014-01-22 上海和辰信息技术有限公司 Cloud service application cluster synchronization system and synchronization method based on cloud service characteristics

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6725392B1 (en) * 1999-03-03 2004-04-20 Adaptec, Inc. Controller fault recovery system for a distributed file system
CN1893370A (en) * 2005-06-29 2007-01-10 国际商业机器公司 Server cluster recovery and maintenance method and system
CN101578599A (en) * 2006-08-07 2009-11-11 米谋萨系统有限公司 Synthesis of fatty acids
CN101304360A (en) * 2007-05-08 2008-11-12 艾岩 System and method for virtualization of user digital terminal
CN103533023A (en) * 2013-07-25 2014-01-22 上海和辰信息技术有限公司 Cloud service application cluster synchronization system and synchronization method based on cloud service characteristics

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110297846A (en) * 2019-05-28 2019-10-01 北京奇艺世纪科技有限公司 A kind of log feature processing system, method, electronic equipment and storage medium
CN110297846B (en) * 2019-05-28 2021-08-20 北京奇艺世纪科技有限公司 Log feature processing system, method, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN106250494B (en) 2019-04-09

Similar Documents

Publication Publication Date Title
US11775524B2 (en) Cache for efficient record lookups in an LSM data structure
CN107491523B (en) Method and device for storing data object
US8924373B2 (en) Query plans with parameter markers in place of object identifiers
US20170147628A1 (en) Transactional cache invalidation for inter-node caching
US10417265B2 (en) High performance parallel indexing for forensics and electronic discovery
US8478797B2 (en) Atomic deletion of database data categories
US10430403B2 (en) Tracking change data in a database
CN106462592A (en) Systems and methods to optimize multi-version support in indexes
Zhang et al. Ranking uncertain sky: The probabilistic top-k skyline operator
CN101510209A (en) Method, system and server for implementing real time search
EP2336901B1 (en) Online access to database snapshots
US10893067B1 (en) Systems and methods for rapidly generating security ratings
US9262511B2 (en) System and method for indexing streams containing unstructured text data
US20110289112A1 (en) Database system, database management method, database structure, and storage medium
CN102819586A (en) Uniform Resource Locator (URL) classifying method and equipment based on cache
CN107169003B (en) Data association method and device
CN110637292A (en) System and method for querying a resource cache
US20220156260A1 (en) Columnar Techniques for Big Metadata Management
CN101459599B (en) Method and system for implementing concurrent execution of cache data access and loading
US20180203908A1 (en) Distributed database system and distributed data processing method
CN106250494A (en) A kind of data management and analysis system based on file system
US10019483B2 (en) Search system and search method
US20210141763A1 (en) Systems and Methods for Large Scale Complex Storage Operation Execution
KR102415155B1 (en) Apparatus and method for retrieving data
US8495025B2 (en) Foldering by stable query

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant