CN109739894A - Supplement method, apparatus, equipment and the storage medium of metadata description - Google Patents

Supplement method, apparatus, equipment and the storage medium of metadata description Download PDF

Info

Publication number
CN109739894A
CN109739894A CN201910014080.9A CN201910014080A CN109739894A CN 109739894 A CN109739894 A CN 109739894A CN 201910014080 A CN201910014080 A CN 201910014080A CN 109739894 A CN109739894 A CN 109739894A
Authority
CN
China
Prior art keywords
metadata
data
field
similarity
supplement
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910014080.9A
Other languages
Chinese (zh)
Other versions
CN109739894B (en
Inventor
司东华
陈文浩
周可
兰冲
杨东方
罗迎宣
沈蓉
王惠平
敖林
郭腾达
覃娆
张玲
张志杰
汪亚男
邱毅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
WeBank Co Ltd
Original Assignee
WeBank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by WeBank Co Ltd filed Critical WeBank Co Ltd
Priority to CN201910014080.9A priority Critical patent/CN109739894B/en
Publication of CN109739894A publication Critical patent/CN109739894A/en
Application granted granted Critical
Publication of CN109739894B publication Critical patent/CN109739894B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Medical Treatment And Welfare Office Work (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of methods of supplement metadata description, this method comprises: obtaining the full dose metadata from each source database pre-saved;Data consanguinity analysis is carried out to the full dose metadata, the blood relationship map between different data table to construct the full dose metadata;The aiming field to additional notes is obtained from the full dose metadata, and determines object table belonging to the aiming field;The upstream table of the object table is determined according to the blood relationship map;The similarity for analyzing the metadata fields and the aiming field in the upstream table carries out additional notes to the aiming field according to the similarity.The invention also discloses device, equipment and a kind of storage mediums of a kind of supplement metadata description.The present invention realizes during supplementing metadata description, reduces manual intervention, reduces cost of labor, improves whole working efficiency.

Description

Supplement method, apparatus, equipment and the storage medium of metadata description
Technical field
The present invention relates to data Treatment process field, more particularly to the method, apparatus of supplement metadata description, equipment and deposit Storage media.
Background technique
Metadata is the data for describing data, and the Lax entropy of metadata refers in descriptive metadata database The information of field meanings, the information can auxiliary data user more fully understand data.In big data platform, deposit sometimes The case where not creating the Lax entropy of metadata because of a variety of causes, can not have when this will lead to third party's number Effect understands data, causes to perplex.
Existing supplement metadata fields describe main scheme, substantially carry out artificial participation by related development personnel and mend Fill, specifically include two schemes: 1) developer modifies the metadata description information in source database by script, then passes through It shows after metadata management system acquisition, processing with data side.This scheme needs to remodify source database table structure, meeting Increase development process risk, higher cost, effect are slower.2) it is based on metadata management system, it is new in metadata management system Increase metadata management module, supports for the description editor of meta-data level and modification.This scheme does not need to walk exploitation again Process, on existing source system data library without influence, however its whole process is still to be supplemented by manually participating in intervening entirely description, High labor cost and working efficiency is lower.
Summary of the invention
It is a primary object of the present invention to propose method, apparatus, equipment and the storage medium of a kind of supplement metadata description, It aims at during supplementing metadata description, reduces manual intervention, reduce cost of labor, improve whole work effect Rate.
To achieve the above object, the present invention provides a kind of method of supplement metadata description, the supplement metadata description Method include the following steps:
Obtain the full dose metadata from each source database pre-saved;
To full dose metadata progress data consanguinity analysis, between the different data table to construct the full dose metadata Blood relationship map;
The aiming field to additional notes is obtained from the full dose metadata, and determines mesh belonging to the aiming field Mark table;
The upstream table of the object table is determined according to the blood relationship map;
The similarity for analyzing the metadata fields and the aiming field in the upstream table, according to the similarity to institute It states aiming field and carries out additional notes.
Preferably, it is described obtain pre-save from the full dose metadata of each source database the step of before, also wrap It includes:
Metadata is pulled from preset source database;
By the meta-data preservation pulled into preset metadata resource library.
Preferably, described that data consanguinity analysis is carried out to the full dose metadata, to construct the full dose metadata not Include: with the step of blood relationship map between tables of data
The data table types of the full dose metadata are obtained, the data table types include big data platform tables of data and biography System relational data table;
Preset blood relationship Hook Function is called to obtain the genetic connection between different big data platform tables of data, and Genetic connection between the big data platform tables of data and the traditional relational tables of data;
Preset chart database engine is called, the blood relationship between the different big data platform tables of data that will acquire Genetic connection between relationship and the big data platform tables of data and the traditional relational tables of data is built into blood relationship figure Spectrum.
Preferably, the step of metadata fields and the similarity of the aiming field analyzed in the upstream table wrap It includes:
Duplicate removal processing is carried out to the metadata fields in the upstream table;
Preset invalid field is rejected from the metadata fields after duplicate removal processing, obtains metadata word to be analyzed Section collection;
According to preset algorithm, it is similar to the aiming field to calculate separately each field that the metadata fields are concentrated Degree.
Preferably, described according to preset algorithm, calculate separately each field and the mesh that the metadata fields are concentrated The step of similarity of marking-up section includes:
According to editing distance algorithm, calculates separately each field that the metadata fields are concentrated and be converted into the target word The minimum edit operation times of Duan Suoxu;
The each field and the target word that the metadata fields are concentrated are calculated according to the minimum edit operation times The similarity of section.
Preferably, described the step of carrying out additional notes to the aiming field according to the similarity, includes:
The determining highest field of similarity with the aiming field is concentrated from the metadata fields;
The acquisition Lax entropy with the highest field of similarity of the aiming field, will the described and mesh Additional notes of the Lax entropy of the highest field of the similarity of marking-up section as the aiming field.
Preferably, described the step of carrying out additional notes to the aiming field according to the similarity, includes:
According to the sequence of the similarity from high to low, the field that the metadata fields are concentrated is arranged, is obtained The additional notes selective listing of the aiming field;
When receiving the selection instruction that user is triggered based on the additional notes selective listing, determine that user is selected Field;
The Lax entropy for obtaining the selected field of user believes the field description of the selected field of the user Cease the additional notes as the aiming field.
In addition, to achieve the above object, the present invention also provides a kind of device of supplement metadata description, the first numbers of the supplement Include: according to the device of description
First obtains module, for obtaining the full dose metadata from each source database pre-saved;
Module is constructed, for carrying out data consanguinity analysis to the full dose metadata, to construct the full dose metadata Blood relationship map between different data table;
Second obtains module, for obtaining the aiming field to additional notes from the full dose metadata, and determines institute State object table belonging to aiming field;
Determining module, for determining the upstream table of the object table according to the blood relationship map;
Additional notes module, for analyzing the similarity of metadata fields and the aiming field in the upstream table, Additional notes are carried out to the aiming field according to the similarity.
In addition, to achieve the above object, the present invention also provides a kind of equipment of supplement metadata description, the first numbers of the supplement Equipment according to description includes: memory, processor and is stored in the benefit that can be run on the memory and on the processor The program of metadata description is filled, the program of the supplement metadata description realizes as described above mend when being executed by the processor The step of filling the method for metadata description.
In addition, to achieve the above object, the present invention also provides a kind of storage medium, supplement is stored on the storage medium The program of the program of metadata description, the supplement metadata description is realized when being executed by processor supplements first number as described above According to description method the step of.
The present invention obtains the full dose metadata from each source database pre-saved;The full dose metadata is carried out Data consanguinity analysis, the blood relationship map between different data table to construct the full dose metadata;From the full dose metadata Aiming field of the middle acquisition to additional notes, and determine object table belonging to the aiming field;It is true according to the blood relationship map The upstream table of the fixed object table;The similarity for analyzing the metadata fields and the aiming field in the upstream table, according to The similarity carries out additional notes to the aiming field.This mode compared with the prior art, is realized according to different numbers It is that aiming field is automatically replenished metadata description according to the genetic connection between table, to reduce manual intervention, reduces costs, Improve whole working efficiency.
Detailed description of the invention
Fig. 1 is the device structure schematic diagram for the hardware running environment that the embodiment of the present invention is related to;
Fig. 2 is the flow diagram of the method first embodiment of present invention supplement metadata description;
Fig. 3 is the schematic diagram that metadata management system pulls metadata from source database in the embodiment of the present invention;
Fig. 4 is the refinement step schematic diagram of step S20 in Fig. 2;
Fig. 5 is the relation schematic diagram in the embodiment of the present invention between object table and upstream table;
Fig. 6 is that the analysis result for carrying out similarity analysis to the aiming field Cust_name in Fig. 5 object table table1 is shown It is intended to.
The embodiments will be further described with reference to the accompanying drawings for the realization, the function and the advantages of the object of the present invention.
Specific embodiment
It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, it is not intended to limit the present invention.
As shown in Figure 1, Fig. 1 is the device structure schematic diagram for the hardware running environment that the embodiment of the present invention is related to.
The equipment of supplement metadata description of the embodiment of the present invention can be PC machine or server apparatus.
As shown in Figure 1, the equipment may include: processor 1001, such as CPU, network interface 1004, user interface 1003, memory 1005, communication bus 1002.Wherein, communication bus 1002 is for realizing the connection communication between these components. User interface 1003 may include display screen (Display), input unit such as keyboard (Keyboard), optional user interface 1003 can also include standard wireline interface and wireless interface.Network interface 1004 optionally may include that the wired of standard connects Mouth, wireless interface (such as WI-FI interface).Memory 1005 can be high speed RAM memory, be also possible to stable memory (non-volatile memory), such as magnetic disk storage.Memory 1005 optionally can also be independently of aforementioned processor 1001 storage device.
It will be understood by those skilled in the art that device structure shown in Fig. 1 does not constitute the restriction to equipment, can wrap It includes than illustrating more or fewer components, perhaps combines certain components or different component layouts.
As shown in Figure 1, as may include that operating system, network are logical in a kind of memory 1005 of computer storage medium Believe the program of module, Subscriber Interface Module SIM and supplement metadata description.
In equipment shown in Fig. 1, network interface 1004 is mainly used for connecting data user, carries out with data user Data communication;User interface 1003 is mainly used for connecting client (user terminal), carries out data communication with client;And processor 1001 programs that can be used for that the supplement metadata stored in memory 1005 is called to describe, and execute following supplement metadata and retouch Operation in each embodiment of the method stated.
Based on above-mentioned hardware configuration, each embodiment of method of present invention supplement metadata description is proposed.
Referring to the flow diagram for the method first embodiment that Fig. 2, Fig. 2 are present invention supplement metadata description, the side Method includes:
Step S10 obtains the full dose metadata from each source database pre-saved;
The method that the present embodiment supplements metadata description is applied to the metadata management system in big data platform, this yuan of number It is equipped on according to management system in the equipment of above-mentioned supplement metadata description, is mainly used for acquiring metadata from source database, and Collected metadata is supplied to metadata application, is used for data user.
In metadata management system, the full dose metadata from each source database is pre-saved.Above-mentioned steps S10 It before, may include: to pull metadata from preset source database;By the meta-data preservation pulled to preset member In data repository, the full dose member from each source database pre-saved is obtained in metadata resource library in order to subsequent Data.
Referring to Fig. 3, Fig. 3 is the signal that metadata management system pulls metadata from source database in the embodiment of the present invention Figure.Wherein, metadata management system includes metadata acquisition module and metadata resource library, and source database includes big data platform Database (such as Hive, Hbase) and distributed traditional Relational DataBase (such as MySQL, Oracle etc.).Metadata acquisition mould Block can by ETL (Extract-Transform-Load, for describe by data from source terminal through extraction (extract), The process of interaction conversion (transform), load (load) to destination) periodically from big data platform database and conventional relationship Metadata is pulled in type database, then by the meta-data preservation pulled into metadata resource library.
It should be noted that the metadata pulled from source database includes but is not limited to be used to describe source system (to refer to The corresponding system of each source database) data-base cluster information, database information, data table information and field information first number According to.Wherein, the data-base cluster information of source system may include bulk encoding, group type, data source types, place Logic Networks Network region, place physical network area, IP address, network port etc.;The database information of source system may include database ID, Database name, data source types, bulk encoding, library description information, exploitation responsible person etc.;The data table information of source system can wrap Include table id, table name, library ID, library name, table description, exploitation responsible person, modification time etc.;The field information of source system can wrap Include Field ID, field name, table name, serial number, data type, length, whether can for it is empty, whether index, whether subregion, word Segment description etc..
In the present embodiment, obtaining the full dose metadata from each source database pre-saved first specifically can To obtain the full dose metadata from each source database pre-saved from above-mentioned metadata resource library.
Step S20 carries out data consanguinity analysis to the full dose metadata, to construct the different numbers of the full dose metadata According to the blood relationship map between table;
Full dose metadata includes several different tables of data, in the step, carries out data consanguinity analysis to full dose metadata, Refer to the data relationship between analytical data, source (upstream blood relationship) and data flow (downstream blood including data in table Edge), finally, the blood relationship map being configured between the different data table of full dose metadata.
It specifically, is the refinement step schematic diagram of step S20 in Fig. 2 referring to Fig. 4, Fig. 4.Above-mentioned steps S20 can be into one Step includes:
Step S21, obtains the data table types of the full dose metadata, and the data table types include big data platform number According to table and traditional relational tables of data;
Step S22 calls preset blood relationship Hook Function to obtain the blood relationship between different big data platform tables of data Genetic connection between relationship and the big data platform tables of data and the traditional relational tables of data;
Step S23 calls preset chart database engine, the different big data platform tables of data that will acquire it Between genetic connection and the big data platform tables of data and the traditional relational tables of data between genetic connection building At blood relationship map.
When it is implemented, obtaining the data table types of full dose metadata first, which may include deriving from The big data platform tables of data of big data platform database and traditional relational tables of data from traditional Relational DataBase.
Then, call preset blood relationship Hook Function (Lineage Hook) to obtain different big data platform tables of data Between genetic connection and the big data platform tables of data and the traditional relational tables of data between genetic connection. Stream compression processing is realized at present mainly by Hive (Tool for Data Warehouse based on Hadoop), and Spark is (a kind of general Big data quick processing engine) and Sqoop (one between Hadoop and relational database server for transmitting data Tool) three kinds of modes, therefore capture blood relationship and can mainly pass through parsing hive sql, spark sql and sqoop implementation procedure It realizes:
a)Hive Lineage Hook.The blood relationship Hook Function of Hive executes SQL by asynchronous capture Hive (Structured Query Language, structured query language) sentence calls the Hive process performing analysis independently realized API (Application Programming Interface, application programming interface) obtain SQL input data information, Output data information and associated mission bit stream;
b)Spark-SQL Lineage Hook.The blood relationship Hook Function of Spark is held by asynchronous acquisition Spark-SQL Capable SQL statement calls the Spark SQL process performing independently realized analysis API to obtain the input data information of SQL, output Data information and associated mission bit stream;
c)Sqoop Lineage Hook.The blood relationship Hook Function of Sqoop, by it is asynchronous capture Sqoop execution order, The exectorial parameter of Sqoop is analyzed, exectorial input data and output data relevant information and associated are obtained The information of business.
The corresponding Lineage Hook of Hive and Spark-SQL is used to capture between different big data platform tables of data Genetic connection, it is optional that genetic connection, Sqoop Lineage Hook are obtained by one or both of them blood relationship Hook Function For capturing the genetic connection between big data platform tables of data and traditional relational tables of data.In the present embodiment, passing through After above-mentioned each blood relationship Hook Function gets genetic connection, the timer-triggered scheduler system at regular intervals of big data platform can also be passed through The data computation module (MapReduce) of big data platform is triggered to process the blood relationship data obtained daily, the data of processing are most It is stored in Hive table eventually, wherein the every blood relationship packet processed contains: the information such as source table, object table, the lattice of record Formula can be with are as follows: srcTable [source table], dstTable [object table], extral_info [related information]).
Later, preset chart database engine is called by metadata management system, the different big datas that will acquire are flat The genetic connection structure between genetic connection and big data platform tables of data and traditional relational tables of data between platform tables of data Build up blood relationship map.Building process are as follows: the object table of a blood relationship information can be used as the source table of other blood relationship information, thus A complete genetic connection chain is formed, different genetic connection chains forms complete genetic connection figure, i.e. blood relationship map.
Step S30 obtains the aiming field to additional notes from the full dose metadata, and determines the aiming field Affiliated object table;
In the step, the aiming field to additional notes, specifically, metadata management are obtained from above-mentioned full dose metadata System can be regular, or in the fetching instruction for receiving user, grabs from the full dose metadata pre-saved and not to describe Field, or it is described as the field of messy code as the aiming field to additional notes, then, it is determined that tables of data belonging to aiming field As object table.
Step S40 determines the upstream table of the object table according to the blood relationship map;
In the step, the upstream table of object table is determined according to the blood relationship map of above-mentioned building, i.e., all includes object table The tables of data of data source.Since the field metadata in the upstream blood relationship of tables of data and this table has a more close relationship, one As for data meaning will not shift and change because of data, can be with for certain field metadata not described It is supplemented using the recommendation that the genetic connection of its upstream does the description of field metadata.
Step S50 analyzes the similarity of the metadata fields and the aiming field in the upstream table, according to the phase Additional notes are carried out to the aiming field like degree.
Several metadata fields are generally comprised in the table of upstream, in the step, analyze each metadata word in the table of upstream Then the similarity of section and aiming field carries out additional notes to aiming field according to similarity.For example, available upstream table In field information with the highest field of aiming field similarity, using the field information as the additional notes of aiming field, or Person obtains the field information for being greater than or equal to several fields of preset threshold with aiming field similarity in the table of upstream and recommends User chooses additional notes of the field information as aiming field by user from this several field information, specific real Shi Shikeling selects different additional notes modes.This side for carrying out additional notes to aiming field by field similarity The accuracy rate of formula, additional notes is higher, and participates in the overall process description without developer, need to do last confirmation, significantly Improve whole working efficiency.
The present embodiment obtains the full dose metadata from each source database pre-saved;To the full dose metadata into Row data consanguinity analysis, the blood relationship map between different data table to construct the full dose metadata;From the full dose member number Aiming field according to middle acquisition to additional notes, and determine object table belonging to the aiming field;According to the blood relationship map Determine the upstream table of the object table;Analyze the similarity of the metadata fields and the aiming field in the upstream table, root Additional notes are carried out to the aiming field according to the similarity.This mode compared with the prior art, is realized according to difference Genetic connection between tables of data is that aiming field is automatically replenished metadata description, to reduce manual intervention, is reduced into This, improves whole working efficiency.
Further, the method first embodiment based on present invention supplement metadata description, proposes that the present invention supplements first number According to the method second embodiment of description.
In the present embodiment, the metadata fields and the similarity of the aiming field in the analysis upstream table Step may include: to carry out duplicate removal processing to the metadata fields in the upstream table;From the metadata after duplicate removal processing Preset invalid field is rejected in field, obtains metadata fields collection to be analyzed;According to preset algorithm, the member is calculated separately The similarity of each field and the aiming field that data field is concentrated.
In the present embodiment, it is contemplated that the metadata fields in the table of upstream may repeat, for example a field may deposit simultaneously It is in multiple tables of data, to improve computational efficiency, duplicate removal processing first can be carried out to the metadata fields in the table of upstream.Specifically Ground is the relation schematic diagram in the embodiment of the present invention between object table and upstream table referring to Fig. 5, Fig. 5.Wherein table1 is target Table, wherein include aiming field Cust_name, table2-table6 is the upstream table of table1, in table4 and table6 all Including ID, therefore it can only retain an id field with duplicate removal.
Then, preset invalid field, such as id, field, mark etc. are rejected from the metadata fields after duplicate removal processing The common field but without practical business meaning, finally obtains a metadata fields collection to be analyzed.Later, according to pre- imputation Method calculates separately each field of metadata fields concentration and the similarity of aiming field.
In one embodiment, described according to preset algorithm, calculate separately each field that the metadata fields are concentrated It may include: that the metadata fields are calculated separately according to editing distance algorithm with the step of similarity of the aiming field The each field concentrated is converted into minimum edit operation times needed for the aiming field;According to the minimum edit operation time Number calculates the similarity of each field that the metadata fields are concentrated and the aiming field.
Specifically, similarity analysis algorithm uses Levenshtein Distance (editing distance) algorithm, which is Refer to that (edit operation includes replacing a character for the minimum edit operation times that take two character strings to be changed into needed for another as one Change another character into, be inserted into a character, delete a character), editing distance is smaller, and the similarity of two character strings is got over Greatly., can be according to editing distance algorithm in the present embodiment, each field for calculating separately metadata fields concentration is converted into target Then minimum edit operation times needed for field calculate each word that metadata fields are concentrated according to minimum edit operation times The similarity of section and aiming field.
Certainly, each field and target word that metadata fields are concentrated can also be calculated using other algorithms when specific implementation The similarity of section, the present embodiment are not construed as limiting this.
It is point that similarity analysis is carried out to the aiming field Cust_name in Fig. 5 object table table1 referring to Fig. 6, Fig. 6 Analyse result schematic diagram.By similarity analysis, Cust_name, Custname, c_ that available metadata fields are concentrated The similarity of name, prod_name, Prod_code and telephone and aiming field Cust_name are respectively 1,0.89, 0.67,0.56,0.22,0.11.
Similarity is being calculated and then additional notes are carried out to aiming field according to similarity.
In one embodiment, the step of additional notes are carried out to the aiming field according to similarity packet It includes: concentrating the determining highest field of similarity with the aiming field from the metadata fields;Obtain the described and mesh The Lax entropy of the highest field of the similarity of marking-up section, by the described and aiming field highest field of similarity Additional notes of the Lax entropy as the aiming field.
For example, can be taken in Fig. 6 with the highest field Cust_name of the similarity of aiming field Cust_name, and from The Lax entropy that Cust_name is obtained in corresponding upstream table, using the Lax entropy as aiming field Cust_ The additional notes of name.This mode realizes the Rapid matching of additional notes information.
In another embodiment, the step of additional notes are carried out to the aiming field according to similarity packet It includes: according to the sequence of the similarity from high to low, the field that the metadata fields are concentrated being arranged, obtains the mesh The additional notes selective listing of marking-up section;When the selection instruction for receiving user and being triggered based on the additional notes selective listing When, determine the selected field of user;The Lax entropy of the selected field of user is obtained, the user is selected Additional notes of the Lax entropy of field as the aiming field.
It, at this time can should for example, the field that metadata fields are concentrated in Fig. 6 is arranged according to similarity sequence from high to low Additional notes selective listing shows front end user, and a field is therefrom selected by user, then, obtains from corresponding upstream table The Lax entropy for taking the selected field in family, using the Lax entropy as the additional notes of aiming field.Pass through This mode realizes user to the flexible choice of additional notes information, improves and carries out the flexible of additional notes to metadata Property.
The present invention also provides a kind of devices of supplement metadata description.The embodiment of the present invention supplements the device of metadata description Include:
First obtains module, for obtaining the full dose metadata from each source database pre-saved;
Module is constructed, for carrying out data consanguinity analysis to the full dose metadata, to construct the full dose metadata Blood relationship map between different data table;
Second obtains module, for obtaining the aiming field to additional notes from the full dose metadata, and determines institute State object table belonging to aiming field;
Determining module, for determining the upstream table of the object table according to the blood relationship map;
Additional notes module, for analyzing the similarity of metadata fields and the aiming field in the upstream table, Additional notes are carried out to the aiming field according to the similarity.
Further, the device of the supplement metadata description further include:
Module is pulled, for pulling metadata from preset source database;
Preserving module, the meta-data preservation for that will pull is into preset metadata resource library.
Further, the building module includes:
Acquiring unit, for obtaining the data table types of the full dose metadata, the data table types include big data Platform data table and traditional relational tables of data;
First call unit, for calling preset blood relationship Hook Function with obtain different big data platform tables of data it Between genetic connection and the big data platform tables of data and the traditional relational tables of data between genetic connection;
Second call unit, for calling preset chart database engine, the different big data that will acquire is flat The blood between genetic connection and the big data platform tables of data and the traditional relational tables of data between platform tables of data Edge relationship is built into blood relationship map.
Further, the additional notes module is also used to:
Duplicate removal processing is carried out to the metadata fields in the upstream table;
Preset invalid field is rejected from the metadata fields after duplicate removal processing, obtains metadata word to be analyzed Section collection;
According to preset algorithm, it is similar to the aiming field to calculate separately each field that the metadata fields are concentrated Degree.
Further, the additional notes module is also used to:
According to editing distance algorithm, calculates separately each field that the metadata fields are concentrated and be converted into the target word The minimum edit operation times of Duan Suoxu;
The each field and the target word that the metadata fields are concentrated are calculated according to the minimum edit operation times The similarity of section.
Further, the additional notes module is also used to:
The determining highest field of similarity with the aiming field is concentrated from the metadata fields;
The acquisition Lax entropy with the highest field of similarity of the aiming field, will the described and mesh Additional notes of the Lax entropy of the highest field of the similarity of marking-up section as the aiming field.
Further, the additional notes module is also used to:
According to the sequence of the similarity from high to low, the field that the metadata fields are concentrated is arranged, is obtained The additional notes selective listing of the aiming field;
When receiving the selection instruction that user is triggered based on the additional notes selective listing, determine that user is selected Field;
The Lax entropy for obtaining the selected field of user believes the field description of the selected field of the user Cease the additional notes as the aiming field.
Operation performed by above-mentioned each program module can refer to the embodiment of the method for present invention supplement metadata description, herein It repeats no more.
The present invention also provides a kind of storage mediums.
The program of supplement metadata description, the program quilt of the supplement metadata description are stored on storage medium of the present invention The step of method of supplement metadata description as described above is realized when processor executes.
Wherein, the program of the supplement metadata description run on the processor, which is performed realized method, to join According to each embodiment of method of present invention supplement metadata description, details are not described herein again.
It should be noted that, in this document, the terms "include", "comprise" or its any other variant are intended to non-row His property includes, so that the process, method, article or the system that include a series of elements not only include those elements, and And further include other elements that are not explicitly listed, or further include for this process, method, article or system institute it is intrinsic Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including being somebody's turn to do There is also other identical elements in the process, method of element, article or system.
The serial number of the above embodiments of the invention is only for description, does not represent the advantages or disadvantages of the embodiments.
Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side Method can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but in many cases The former is more preferably embodiment.Based on this understanding, technical solution of the present invention substantially in other words does the prior art The part contributed out can be embodied in the form of software products, which is stored in one as described above In storage medium (such as ROM/RAM, magnetic disk, CD), including some instructions are used so that terminal device (it can be mobile phone, Computer, server, air conditioner or network equipment etc.) execute method described in each embodiment of the present invention.
The above is only a preferred embodiment of the present invention, is not intended to limit the scope of the invention, all to utilize this hair Equivalent structure or equivalent flow shift made by bright specification and accompanying drawing content is applied directly or indirectly in other relevant skills Art field, is included within the scope of the present invention.

Claims (10)

1. a kind of method of supplement metadata description, which is characterized in that the method for the supplement metadata description includes following step It is rapid:
Obtain the full dose metadata from each source database pre-saved;
Data consanguinity analysis is carried out to the full dose metadata, the blood between different data table to construct the full dose metadata Edge map;
The aiming field to additional notes is obtained from the full dose metadata, and determines target belonging to the aiming field Table;
The upstream table of the object table is determined according to the blood relationship map;
The similarity for analyzing the metadata fields and the aiming field in the upstream table, according to the similarity to the mesh Marking-up section carries out additional notes.
2. the method for supplement metadata description as described in claim 1, which is characterized in that described to obtain the source pre-saved Before in the full dose metadata of each source database the step of, further includes:
Metadata is pulled from preset source database;
By the meta-data preservation pulled into preset metadata resource library.
3. as described in claim 1 supplement metadata description method, which is characterized in that it is described to the full dose metadata into The step of row data consanguinity analysis, blood relationship map between the different data table to construct the full dose metadata includes:
The data table types of the full dose metadata are obtained, the data table types include that big data platform tables of data and tradition are closed It is type tables of data;
Preset blood relationship Hook Function is called to obtain genetic connection between different big data platform tables of data and described Genetic connection between big data platform tables of data and the traditional relational tables of data;
Preset chart database engine is called, the blood relationship between the different big data platform tables of data that will acquire is closed System and the genetic connection between the big data platform tables of data and the traditional relational tables of data are built into blood relationship figure Spectrum.
4. the method for supplement metadata description as claimed any one in claims 1 to 3, which is characterized in that the analysis institute The step of stating the metadata fields and the similarity of the aiming field in the table of upstream include:
Duplicate removal processing is carried out to the metadata fields in the upstream table;
Preset invalid field is rejected from the metadata fields after duplicate removal processing, obtains metadata fields to be analyzed Collection;
According to preset algorithm, the similarity of each field and the aiming field that the metadata fields are concentrated is calculated separately.
5. the method for supplement metadata description as claimed in claim 4, which is characterized in that it is described according to preset algorithm, respectively The step of calculating the similarity of each field and the aiming field that the metadata fields are concentrated include:
According to editing distance algorithm, calculates separately each field that the metadata fields are concentrated and be converted into the aiming field institute The minimum edit operation times needed;
The each field and the aiming field that the metadata fields are concentrated are calculated according to the minimum edit operation times Similarity.
6. as claimed in claim 4 supplement metadata description method, which is characterized in that it is described according to the similarity to institute Stating the step of aiming field carries out additional notes includes:
The determining highest field of similarity with the aiming field is concentrated from the metadata fields;
The acquisition Lax entropy with the highest field of similarity of the aiming field, will the described and target word Additional notes of the Lax entropy of the highest field of similarity of section as the aiming field.
7. as claimed in claim 4 supplement metadata description method, which is characterized in that it is described according to the similarity to institute Stating the step of aiming field carries out additional notes includes:
According to the sequence of the similarity from high to low, the field that the metadata fields are concentrated is arranged, is obtained described The additional notes selective listing of aiming field;
When receiving the selection instruction that user is triggered based on the additional notes selective listing, the selected word of user is determined Section;
The Lax entropy for obtaining the selected field of user makees the Lax entropy of the selected field of the user For the additional notes of the aiming field.
8. it is a kind of supplement metadata description device, which is characterized in that it is described supplement metadata description device include:
First obtains module, for obtaining the full dose metadata from each source database pre-saved;
Module is constructed, for carrying out data consanguinity analysis to the full dose metadata, to construct the difference of the full dose metadata Blood relationship map between tables of data;
Second obtains module, for obtaining the aiming field to additional notes from the full dose metadata, and determines the mesh Object table belonging to marking-up section;
Determining module, for determining the upstream table of the object table according to the blood relationship map;
Additional notes module, for analyzing the similarity of metadata fields and the aiming field in the upstream table, according to The similarity carries out additional notes to the aiming field.
9. a kind of equipment of supplement metadata description, which is characterized in that the equipment of the supplement metadata description includes: storage Device, processor and the program for being stored in the supplement metadata description that can be run on the memory and on the processor, institute State the supplement realized as described in any one of claims 1 to 7 when the program that supplement metadata describes is executed by the processor The step of method of metadata description.
10. a kind of storage medium, which is characterized in that the program of supplement metadata description is stored on the storage medium, it is described The program of supplement metadata description realizes the supplement metadata as described in any one of claims 1 to 7 when being executed by processor The step of method of description.
CN201910014080.9A 2019-01-04 2019-01-04 Method, device, equipment and storage medium for supplementing metadata description Active CN109739894B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910014080.9A CN109739894B (en) 2019-01-04 2019-01-04 Method, device, equipment and storage medium for supplementing metadata description

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910014080.9A CN109739894B (en) 2019-01-04 2019-01-04 Method, device, equipment and storage medium for supplementing metadata description

Publications (2)

Publication Number Publication Date
CN109739894A true CN109739894A (en) 2019-05-10
CN109739894B CN109739894B (en) 2022-12-09

Family

ID=66363782

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910014080.9A Active CN109739894B (en) 2019-01-04 2019-01-04 Method, device, equipment and storage medium for supplementing metadata description

Country Status (1)

Country Link
CN (1) CN109739894B (en)

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110347882A (en) * 2019-06-27 2019-10-18 北京明略软件系统有限公司 Consanguinity analysis method and device, storage medium and the electronic device of data
CN110399423A (en) * 2019-07-24 2019-11-01 北京明略软件系统有限公司 Processing method and processing device, storage medium and the electronic device of metadata genetic connection
CN110442604A (en) * 2019-07-11 2019-11-12 新华三大数据技术有限公司 Data flow querying method, abstracting method, processing method and relevant apparatus
CN110647564A (en) * 2019-08-14 2020-01-03 中国平安财产保险股份有限公司 Hive table establishing method, electronic device and computer readable storage medium
CN110827934A (en) * 2019-08-19 2020-02-21 医渡云(北京)技术有限公司 CRF (conditional random access memory) monitoring method and device
CN110968592A (en) * 2019-12-06 2020-04-07 深圳前海环融联易信息科技服务有限公司 Metadata acquisition method and device, computer equipment and computer-readable storage medium
CN111026736A (en) * 2019-12-13 2020-04-17 中盈优创资讯科技有限公司 Data blood margin management method and device and data blood margin analysis method and device
CN111046242A (en) * 2019-11-27 2020-04-21 支付宝(杭州)信息技术有限公司 Data processing method, device, equipment and medium
CN111125229A (en) * 2019-12-24 2020-05-08 杭州数梦工场科技有限公司 Data blood margin generation method and device and electronic equipment
CN111488363A (en) * 2020-06-28 2020-08-04 平安国际智慧城市科技股份有限公司 Data processing method, device, electronic equipment and medium
CN111782629A (en) * 2020-06-23 2020-10-16 京东数字科技控股有限公司 Feature processing script generation method and device
CN112100201A (en) * 2020-09-30 2020-12-18 东莞市盟大塑化科技有限公司 Data monitoring method, device, equipment and storage medium based on big data technology
CN112328599A (en) * 2020-11-12 2021-02-05 杭州数梦工场科技有限公司 Metadata-based field blood relationship analysis method and device
CN112395287A (en) * 2019-08-19 2021-02-23 北京国双科技有限公司 Table classification method, table creation method, device, equipment and medium
CN112711591A (en) * 2020-12-31 2021-04-27 天云融创数据科技(北京)有限公司 Data blood margin determination method and device based on field level of knowledge graph
CN112817984A (en) * 2021-02-22 2021-05-18 杭州数梦工场科技有限公司 Data processing method and device, and data source obtaining method and device
CN112988698A (en) * 2019-12-02 2021-06-18 阿里巴巴集团控股有限公司 Data processing method and device
CN113111636A (en) * 2021-05-17 2021-07-13 京东科技控股股份有限公司 Data uniqueness standard identification method and device
WO2021218021A1 (en) * 2020-04-28 2021-11-04 平安科技(深圳)有限公司 Data-based blood relationship analysis method, apparatus, and device and computer-readable storage medium
CN113626438A (en) * 2021-08-12 2021-11-09 深圳平安智汇企业信息管理有限公司 Data table management method and device, computer equipment and storage medium
WO2022143045A1 (en) * 2020-12-30 2022-07-07 中兴通讯股份有限公司 Method and apparatus for determining data blood relationship, and storage medium and electronic apparatus
CN114840563A (en) * 2021-02-01 2022-08-02 腾讯科技(深圳)有限公司 Method, device, equipment and storage medium for generating field description information
CN116484084A (en) * 2023-06-21 2023-07-25 广州信安数据有限公司 Metadata blood-margin analysis method, medium and system based on application information mining

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104169912A (en) * 2012-03-27 2014-11-26 株式会社东芝 Information processing terminal and method, and information management apparatus and method
CN104424269A (en) * 2013-08-30 2015-03-18 中国电信股份有限公司 Data linage analysis method and device
CN105787051A (en) * 2016-02-26 2016-07-20 广州品唯软件有限公司 Analysis method and device based on metadata model
CN106709024A (en) * 2016-12-28 2017-05-24 深圳市华傲数据技术有限公司 Data table source-tracing method and device based on consanguinity analysis
CN108197182A (en) * 2017-12-25 2018-06-22 百味云科技股份有限公司 A kind of data atlas analysis system and method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104169912A (en) * 2012-03-27 2014-11-26 株式会社东芝 Information processing terminal and method, and information management apparatus and method
CN104424269A (en) * 2013-08-30 2015-03-18 中国电信股份有限公司 Data linage analysis method and device
CN105787051A (en) * 2016-02-26 2016-07-20 广州品唯软件有限公司 Analysis method and device based on metadata model
CN106709024A (en) * 2016-12-28 2017-05-24 深圳市华傲数据技术有限公司 Data table source-tracing method and device based on consanguinity analysis
CN108197182A (en) * 2017-12-25 2018-06-22 百味云科技股份有限公司 A kind of data atlas analysis system and method

Cited By (37)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110347882A (en) * 2019-06-27 2019-10-18 北京明略软件系统有限公司 Consanguinity analysis method and device, storage medium and the electronic device of data
CN110442604A (en) * 2019-07-11 2019-11-12 新华三大数据技术有限公司 Data flow querying method, abstracting method, processing method and relevant apparatus
CN110442604B (en) * 2019-07-11 2022-03-11 新华三大数据技术有限公司 Data flow direction query method, data flow direction extraction method, data flow direction processing method and related devices
CN110399423A (en) * 2019-07-24 2019-11-01 北京明略软件系统有限公司 Processing method and processing device, storage medium and the electronic device of metadata genetic connection
CN110647564A (en) * 2019-08-14 2020-01-03 中国平安财产保险股份有限公司 Hive table establishing method, electronic device and computer readable storage medium
CN110647564B (en) * 2019-08-14 2023-11-24 中国平安财产保险股份有限公司 Hive table building method, electronic device and computer readable storage medium
CN112395287A (en) * 2019-08-19 2021-02-23 北京国双科技有限公司 Table classification method, table creation method, device, equipment and medium
CN110827934A (en) * 2019-08-19 2020-02-21 医渡云(北京)技术有限公司 CRF (conditional random access memory) monitoring method and device
CN111046242A (en) * 2019-11-27 2020-04-21 支付宝(杭州)信息技术有限公司 Data processing method, device, equipment and medium
CN111046242B (en) * 2019-11-27 2023-09-26 支付宝(杭州)信息技术有限公司 Data processing method, device, equipment and medium
CN112988698A (en) * 2019-12-02 2021-06-18 阿里巴巴集团控股有限公司 Data processing method and device
CN110968592A (en) * 2019-12-06 2020-04-07 深圳前海环融联易信息科技服务有限公司 Metadata acquisition method and device, computer equipment and computer-readable storage medium
CN110968592B (en) * 2019-12-06 2023-11-21 深圳前海环融联易信息科技服务有限公司 Metadata acquisition method, metadata acquisition device, computer equipment and computer readable storage medium
CN111026736B (en) * 2019-12-13 2024-03-12 中盈优创资讯科技有限公司 Data blood margin management method and device and data blood margin analysis method and device
CN111026736A (en) * 2019-12-13 2020-04-17 中盈优创资讯科技有限公司 Data blood margin management method and device and data blood margin analysis method and device
CN111125229A (en) * 2019-12-24 2020-05-08 杭州数梦工场科技有限公司 Data blood margin generation method and device and electronic equipment
WO2021218021A1 (en) * 2020-04-28 2021-11-04 平安科技(深圳)有限公司 Data-based blood relationship analysis method, apparatus, and device and computer-readable storage medium
CN111782629B (en) * 2020-06-23 2024-05-17 京东科技控股股份有限公司 Feature processing script generation method and device
CN111782629A (en) * 2020-06-23 2020-10-16 京东数字科技控股有限公司 Feature processing script generation method and device
CN111488363A (en) * 2020-06-28 2020-08-04 平安国际智慧城市科技股份有限公司 Data processing method, device, electronic equipment and medium
CN111488363B (en) * 2020-06-28 2020-10-02 平安国际智慧城市科技股份有限公司 Data processing method, device, electronic equipment and medium
CN112100201A (en) * 2020-09-30 2020-12-18 东莞市盟大塑化科技有限公司 Data monitoring method, device, equipment and storage medium based on big data technology
CN112100201B (en) * 2020-09-30 2024-02-06 东莞盟大集团有限公司 Data monitoring method, device, equipment and storage medium based on big data technology
CN112328599A (en) * 2020-11-12 2021-02-05 杭州数梦工场科技有限公司 Metadata-based field blood relationship analysis method and device
WO2022143045A1 (en) * 2020-12-30 2022-07-07 中兴通讯股份有限公司 Method and apparatus for determining data blood relationship, and storage medium and electronic apparatus
CN112711591B (en) * 2020-12-31 2021-10-08 天云融创数据科技(北京)有限公司 Data blood margin determination method and device based on field level of knowledge graph
CN112711591A (en) * 2020-12-31 2021-04-27 天云融创数据科技(北京)有限公司 Data blood margin determination method and device based on field level of knowledge graph
CN114840563A (en) * 2021-02-01 2022-08-02 腾讯科技(深圳)有限公司 Method, device, equipment and storage medium for generating field description information
CN114840563B (en) * 2021-02-01 2024-05-03 腾讯科技(深圳)有限公司 Method, device, equipment and storage medium for generating field description information
CN112817984B (en) * 2021-02-22 2023-10-20 杭州数梦工场科技有限公司 Data processing method and device, and data source acquisition method and device
CN112817984A (en) * 2021-02-22 2021-05-18 杭州数梦工场科技有限公司 Data processing method and device, and data source obtaining method and device
CN113111636B (en) * 2021-05-17 2024-04-12 京东科技控股股份有限公司 Data uniqueness standard identification method and device
CN113111636A (en) * 2021-05-17 2021-07-13 京东科技控股股份有限公司 Data uniqueness standard identification method and device
CN113626438A (en) * 2021-08-12 2021-11-09 深圳平安智汇企业信息管理有限公司 Data table management method and device, computer equipment and storage medium
CN113626438B (en) * 2021-08-12 2024-04-30 深圳平安智汇企业信息管理有限公司 Data table management method, device, computer equipment and storage medium
CN116484084B (en) * 2023-06-21 2023-11-17 广州信安数据有限公司 Metadata blood-margin analysis method, medium and system based on application information mining
CN116484084A (en) * 2023-06-21 2023-07-25 广州信安数据有限公司 Metadata blood-margin analysis method, medium and system based on application information mining

Also Published As

Publication number Publication date
CN109739894B (en) 2022-12-09

Similar Documents

Publication Publication Date Title
CN109739894A (en) Supplement method, apparatus, equipment and the storage medium of metadata description
US11526531B2 (en) Dynamic field data translation to support high performance stream data processing
US8065326B2 (en) System and method for building decision trees in a database
CN102033748B (en) Method for generating data processing flow codes
Ediger et al. Tracking structure of streaming social networks
US10102039B2 (en) Converting a hybrid flow
US10747958B2 (en) Dependency graph based natural language processing
CN104424018B (en) Distributed Calculation transaction methods and device
US11003649B2 (en) Index establishment method and device
KR101617696B1 (en) Method and device for mining data regular expression
CN107784026A (en) A kind of ETL data processing methods and device
US20150363467A1 (en) Performing an index operation in a mapreduce environment
US20060235879A1 (en) Techniques for specifying and collecting data aggregations
CN107256206A (en) The method and apparatus of character stream format conversion
WO2023134061A1 (en) Artificial intelligence-based method and apparatus for determining drug feature information
CN110149529A (en) Processing method, server and the storage medium of media information
CN109902126B (en) Loading system supporting HIVE automatic partition and implementation method thereof
CN114756629B (en) Multi-source heterogeneous data interaction analysis engine and method based on SQL
CN108073582B (en) Computing framework selection method and device
CN112182031B (en) Data query method and device, storage medium and electronic device
CN105573726B (en) A kind of rules process method and equipment
CN104462610B (en) Distributed RDF storages and enquiring and optimizing method with reference to body
CN109543772A (en) Data set automatic matching method, device, equipment and computer readable storage medium
CN110825453B (en) Data processing method and device based on big data platform
CN114328606A (en) Method, device and storage medium for improving SQL execution efficiency

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant