CN104142980B - Metadata schema management system and management method based on big data - Google Patents

Metadata schema management system and management method based on big data Download PDF

Info

Publication number
CN104142980B
CN104142980B CN201410336111.XA CN201410336111A CN104142980B CN 104142980 B CN104142980 B CN 104142980B CN 201410336111 A CN201410336111 A CN 201410336111A CN 104142980 B CN104142980 B CN 104142980B
Authority
CN
China
Prior art keywords
metadata
data
data source
schema
metadata schema
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410336111.XA
Other languages
Chinese (zh)
Other versions
CN104142980A (en
Inventor
闵圣捷
谢朝阳
童晓渝
王慧
赵斌
靳永超
邹云
丁星
武静
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CLP SECTION HUAYUN INFORMATION TECHNOLOGY Co Ltd
Original Assignee
CLP SECTION HUAYUN INFORMATION TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CLP SECTION HUAYUN INFORMATION TECHNOLOGY Co Ltd filed Critical CLP SECTION HUAYUN INFORMATION TECHNOLOGY Co Ltd
Priority to CN201410336111.XA priority Critical patent/CN104142980B/en
Publication of CN104142980A publication Critical patent/CN104142980A/en
Application granted granted Critical
Publication of CN104142980B publication Critical patent/CN104142980B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/907Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually

Abstract

The invention provides a kind of metadata schema management system and management method based on big data, the management method comprises the following steps:Step 1, judge the type of the data source structure of big data;Step 2, after carrying out Metadata Extraction to structured data source, perform step 4;Step 3, after carrying out Metadata Extraction to unstructured data sources, perform step 4;Step 4, the relation of the metadata after definition extraction, and corresponding metadata schema is formed, perform step 5;Step 5, the metadata schema of formation is stored in database to graphically, performs step 6;Step 6, according to the metadata schema defined, metadata is issued according to business demand, metadata is used to provide external system.The present invention, which realizes, manages different types of data, unified Meta data system can be built on heterogeneous data source, and provide storage, management and the function of using to the system.

Description

Metadata schema management system and management method based on big data
Technical field
The present invention relates to a kind of metadata schema management system in telecommunication technology field and management method, in particular it relates to A kind of metadata schema management system and management method based on big data.
Background technology
People describe and defined mass data caused by the information explosion epoch with big data, and name associated skill Art develops and innovation.Data expand rapidly and become big, and it decides the future development of enterprise, although enterprise may be simultaneously now Do not recognize that data explosion increases the hidden danger for bringing problem, but over time, people will more and more anticipate Know importance of the data to enterprise.
The big data epoch propose new challenge to the data controling power of the mankind, as Internet of Things and mobile terminal continue Mass data is constantly produced, and data type is enriched, and how to manage these different types of data just becomes one The problem of difficult.Metadata schema management method of the invention based on big data is exactly to adapt to such environment, is solved big The different types of problem of management of data.
The content of the invention
For in the prior art the defects of, it is an object of the invention to provide a kind of metadata schema management based on big data Systems and management method, it, which is realized, manages different types of data, and unified metadata can be built on heterogeneous data source System, and storage, management and the function of using to the system are provided.
According to an aspect of the present invention, there is provided a kind of metadata schema management method based on big data, its feature exist In it comprises the following steps:Step 1, judge the type of the data source structure of big data, that is, judge be structured data source also It is unstructured data sources, if structured data source then performs step 2, if unstructured data sources then perform step 3; Step 2, after carrying out Metadata Extraction to structured data source, perform step 4;Step 3, unstructured data sources are carried out After Metadata Extraction, step 4 is performed;Step 4, the relation of the metadata after definition extraction, and form corresponding first number According to model, step 5 is performed;Step 5, the metadata schema of formation is stored in database to graphically, performs step Six;Step 6, according to the metadata schema defined, metadata is issued according to business demand, to provide external system Use metadata.
Preferably, the structured data source inclusion relation database and document form, unstructured data sources include NOSQL databases.
Preferably, the step 2 and step 3 manually extract user-defined metadata, and by metadata lattice Formula is converted into the form for meeting JSON data standards.
Preferably, the step 5 parses the JSON data formats of metadata schema first, and this data format is parsed and become Into node, the data format of the figure identification method of node relationships, graphic data base then is arrived into node, node relationships storage In.
The present invention also provides a kind of metadata schema management system based on big data, it is characterised in that it includes:
Judge module, the type of the data source structure for judging big data;
Abstraction module, for carrying out Metadata Extraction to structured data source or to unstructured data sources;
Model definition and formation module, for defining the relation of the metadata after extracting, and form corresponding first number According to model;
Memory module, model definition and the metadata schema for forming module are stored in database;
Release module, for being issued to metadata.
Compared with prior art, the present invention has following beneficial effect:One, the present invention is directly according to business demand to not Same type, metadata information is extracted between diverse geographic location database, is merged, and is shared, and fusion and carries out metadata The isomery processing of data modeling, isomery processing are effectively managed based on structured data source and unstructured data sources.Two, The present invention provides basic uniform data standard for the excavation and analysis of mass data, and is laid the foundation for structure industry semantic base. Three, the present invention provides the user a whole set of complete metadata management function.Four, the present invention realizes quickly for big data processing, high Imitate, accurately metadata and metadata schema store function.Five, the pattern of graphics mode storage metadata schema can reach Inquiry velocity is quick, and bandwagon effect is clear, this bandwagon effect clearly illustrate metadata data model establish process and The process of model extension.Six, the present invention establishes unified, stable metadata data warehouse for big data processing.
Brief description of the drawings
The detailed description made by reading with reference to the following drawings to non-limiting example, further feature of the invention, Objects and advantages will become more apparent upon:
Fig. 1 is the flow chart of the metadata schema management method of the invention based on big data.
Fig. 2 is the theory diagram of the metadata schema management system of the invention based on big data.
Embodiment
With reference to specific embodiment, the present invention is described in detail.Following examples will be helpful to the technology of this area Personnel further understand the present invention, but the invention is not limited in any way.It should be pointed out that the ordinary skill to this area For personnel, without departing from the inventive concept of the premise, various modifications and improvements can be made.These belong to the present invention Protection domain.
As shown in figure 1, the metadata schema management method of the invention based on big data comprises the following steps:
Step 1, judge the type of the data source structure of big data, that is, judge to be structured data source or unstructured Data source, if structured data source then performs step 2, if unstructured data sources then perform step 3;Structural data Source inclusion relation database and document form, relational database such as ORACLE, MYSQL, DB2;Document form such as CSV, XLSX etc.. Unstructured data sources include NOSQL (database for referring to non-relational) database.Step 1 is specifically sentenced by judge module The type of the data source structure of disconnected big data, it is i.e. with bivariate table knot according to structural data characteristic the characteristics of structured data source Structure carrys out this feature of logic realization data to formulate data source semantic type standard, and unstructured data sources feature is according to non-knot Structure data source characteristic is document, picture, form, image, and audio etc. formulates data source semantic type standard.
Step 2, after carrying out Metadata Extraction to structured data source, perform step 4;Step 2 is specifically by extraction mould Block carries out Metadata Extraction to structured data source;
Step 3, after carrying out Metadata Extraction to unstructured data sources, perform step 4;Step 3 is specifically by extracting Module carries out Metadata Extraction to structured data source;
Step 4, the relation of the metadata after definition extraction simultaneously form corresponding metadata schema, perform step 5; Step 4 defines the various relations between the different metadata after extracting particular by metadata data modeling, by not of the same trade or business Different relations are established in business, so as to form corresponding metadata schema by this different metadata and its various relation; Step 4 is specifically by model definition and forms module completion;
Step 5, the metadata schema of formation is stored in database to graphically, performs step 6;Step 5 has Body is completed by memory module;
Step 6, according to the metadata schema defined, metadata is issued according to business demand, to provide outside System uses metadata.Step 6 is specifically to be completed by release module.
Wherein, step 2 and step 3 manually extract user-defined metadata, and metadata form is changed Into the lattice for meeting a kind of JSON (JavaScript Object Notation, being data interchange format of lightweight) data standard Formula, the benefit of this data standard are to define the semantic criteria of metadata, avoid semantic conflict.Step 5 parses metadata first The JSON data formats of model, by this data format parsing become node, node relationships figure identification method data format, Then by node, node relationships storage into graphic data base.Metadata is a kind of binary message, is to data and information money The descriptive information in source.
As shown in Fig. 2 the metadata schema management system of the invention based on big data includes:
Judge module, the type of the data source structure for judging big data;
Abstraction module, for carrying out Metadata Extraction to structured data source or to unstructured data sources;
Model definition and formation module, for defining the relation of the metadata after extracting, and form corresponding first number According to model;
Memory module, model definition and the metadata schema for forming module are stored in database;
Release module, for being issued to metadata.
In summary, the present invention, which realizes, manages different types of data, and unification can be built on heterogeneous data source Meta data system, this Meta data system include extraction, modeling, storage, inquiry and management of isomery metadata etc..
The specific embodiment of the present invention is described above.It is to be appreciated that the invention is not limited in above-mentioned Particular implementation, those skilled in the art can make various deformations or amendments within the scope of the claims, this not shadow Ring the substantive content of the present invention.

Claims (1)

1. a kind of metadata schema management method based on big data, it is characterised in that it comprises the following steps:
Step 1, judge the type of the data source structure of big data, that is, judge it is structured data source or unstructured data Source, if structured data source then performs step 2, if unstructured data sources then perform step 3;
Step 2, after carrying out Metadata Extraction to structured data source, perform step 4;
Step 3, after carrying out Metadata Extraction to unstructured data sources, perform step 4;
Step 4, the relation of the metadata after definition extraction, and corresponding metadata schema is formed, perform step 5;
Step 5, the metadata schema of formation is stored in database to graphically, performs step 6;
Step 6, according to the metadata schema defined, metadata is issued according to business demand, to provide external system Use metadata;
The structured data source inclusion relation database and document form, unstructured data sources include NOSQL databases;
The step 2 and step 3 manually extract user-defined metadata, and metadata form is converted into meeting The form of JSON data standards;
The step 5 parses the JSON data formats of metadata schema first, and the parsing of this data format is become into node, node The data format of the figure identification method of relation, then by node, node relationships storage into graphic data base.
CN201410336111.XA 2014-07-15 2014-07-15 Metadata schema management system and management method based on big data Active CN104142980B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410336111.XA CN104142980B (en) 2014-07-15 2014-07-15 Metadata schema management system and management method based on big data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410336111.XA CN104142980B (en) 2014-07-15 2014-07-15 Metadata schema management system and management method based on big data

Publications (2)

Publication Number Publication Date
CN104142980A CN104142980A (en) 2014-11-12
CN104142980B true CN104142980B (en) 2017-11-17

Family

ID=51852154

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410336111.XA Active CN104142980B (en) 2014-07-15 2014-07-15 Metadata schema management system and management method based on big data

Country Status (1)

Country Link
CN (1) CN104142980B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110188887A (en) * 2018-09-26 2019-08-30 第四范式(北京)技术有限公司 The data managing method and device of Machine oriented study

Families Citing this family (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103886004B (en) * 2013-11-29 2017-06-09 北京吉威时代软件股份有限公司 A kind of data type data modeling processing method
CN104580474A (en) * 2015-01-13 2015-04-29 深圳市融创天下科技有限公司 Urban operation sign big data visualization multi-screen interaction display platform and method
CN105574086A (en) * 2015-12-10 2016-05-11 天津海量信息技术有限公司 Artificial intelligence extraction method of internet unstructured data fields
CN106886535A (en) * 2015-12-16 2017-06-23 大唐软件技术股份有限公司 A kind of data pick-up method and apparatus for being adapted to multiple data sources
CN105701181A (en) * 2016-01-06 2016-06-22 中电科华云信息技术有限公司 Dynamic heterogeneous metadata acquisition method and system
CN105912636B (en) * 2016-04-08 2020-04-07 金蝶软件(中国)有限公司 Map/Reduce-based ETL data processing method and device
CN106557569B (en) * 2016-11-14 2020-07-03 用友网络科技股份有限公司 Method and device for importing unstructured document based on meta-model
CN108320066A (en) * 2017-01-18 2018-07-24 重庆邮电大学 A kind of Explore of Unified Management Ideas for realizing different production lines based on metadata
CN114490630A (en) 2017-04-25 2022-05-13 华为技术有限公司 Query processing method, data source registration method and query engine
CN107291875B (en) * 2017-06-19 2019-12-06 华中科技大学 Metadata organization management method and system based on metadata graph
CN107633181B (en) * 2017-09-12 2021-01-26 复旦大学 Data model realization method facing data open sharing and operation system thereof
CN109242259B (en) * 2018-08-10 2020-12-11 华迪计算机集团有限公司 Data integration method and system based on basic data resource library
CN109542960B (en) * 2018-10-18 2023-03-14 国网内蒙古东部电力有限公司信息通信分公司 Data analysis domain system
CN109710602A (en) * 2018-12-26 2019-05-03 中科曙光国际信息产业有限公司 Data model detection method and device
CN109739893B (en) * 2018-12-28 2022-04-22 上海尚往网络科技有限公司 Metadata management method, equipment and computer readable medium
CN109871417A (en) * 2018-12-29 2019-06-11 国家开发银行 The metadata visualization map constructing method and system of knowledge based map
CN109857822A (en) * 2018-12-29 2019-06-07 国家开发银行 Meta-model conversion method and management system based on chart database
CN110209380B (en) * 2019-05-30 2020-11-03 上海直真君智科技有限公司 Unified dynamic metadata processing method oriented to big data heterogeneous model
US11703404B2 (en) 2019-06-17 2023-07-18 Colorado State University Research Foundation Device for automated crop root sampling
US11494611B2 (en) 2019-07-31 2022-11-08 International Business Machines Corporation Metadata-based scientific data characterization driven by a knowledge database at scale
CN112115183B (en) * 2020-09-18 2021-09-21 广州锦行网络科技有限公司 Honeypot system threat information analysis method based on graph

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101908176A (en) * 2010-08-02 2010-12-08 国电南瑞科技股份有限公司 Method for modeling on basis of power information data and applying metadata management
CN103246753A (en) * 2013-05-30 2013-08-14 安徽皖通科技股份有限公司 Method for generating entity metadata model according to database structure

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070233680A1 (en) * 2006-03-31 2007-10-04 Microsoft Corporation Auto-generating reports based on metadata

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101908176A (en) * 2010-08-02 2010-12-08 国电南瑞科技股份有限公司 Method for modeling on basis of power information data and applying metadata management
CN103246753A (en) * 2013-05-30 2013-08-14 安徽皖通科技股份有限公司 Method for generating entity metadata model according to database structure

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于 JSON 的电力企业业务系统非结构化数据抽取方法;徐小天 等;《华北电力技术》;20131130(第2013年第11期);第32-35页 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110188887A (en) * 2018-09-26 2019-08-30 第四范式(北京)技术有限公司 The data managing method and device of Machine oriented study
CN110188887B (en) * 2018-09-26 2022-11-08 第四范式(北京)技术有限公司 Data management method and device for machine learning

Also Published As

Publication number Publication date
CN104142980A (en) 2014-11-12

Similar Documents

Publication Publication Date Title
CN104142980B (en) Metadata schema management system and management method based on big data
CN110941612B (en) Autonomous data lake construction system and method based on associated data
CN105446966B (en) The method and apparatus that production Methods data are converted to the mapping ruler of RDF format data
Kumar Kaliyar Graph databases: A survey
CN106202292B (en) Standard information analysis method based on structured data model
CN104866593A (en) Database searching method based on knowledge graph
CN103699638A (en) Method for realizing cross-database type synchronous data based on configuration parameters
CN112364046B (en) Knowledge graph-based main data management method in heterogeneous environment
CN103116574B (en) From the method for natural language text excavation applications process body
US20150293947A1 (en) Validating relationships between entities in a data model
CN103353899A (en) Accurate summarized information searching method
CN106503214A (en) A kind of complex rule matching process based on Redis memory databases
CN104346466A (en) Method and device of adding new attribute data in database
US20190311051A1 (en) Virtual columns to expose row specific details for query execution in column store databases
CN114817481A (en) Big data-based intelligent supply chain visualization method and device
CN104809186A (en) Constructing method for mold design and manufacturing knowledge base
CN104346331A (en) Retrieval method and system for XML database
CN107526746A (en) The method and apparatus of management document index
CN103927402A (en) Control logic diagram modular design management system implementation method
CN105159904B (en) A kind of method and system of digital resource associate management
CN104794244B (en) A kind of method and apparatus that figure conversion is realized based on MongoDB
CN106933844B (en) Construction method of reachability query index facing large-scale RDF data
Kim et al. Customer preference analysis based on SNS data
CN116097253A (en) Method and device for constructing multi-level knowledge graph
CN112199488A (en) Incremental knowledge graph entity extraction method and system for power customer service question answering

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant