CN109739894A - Supplement method, apparatus, equipment and the storage medium of metadata description - Google Patents
Supplement method, apparatus, equipment and the storage medium of metadata description Download PDFInfo
- Publication number
- CN109739894A CN109739894A CN201910014080.9A CN201910014080A CN109739894A CN 109739894 A CN109739894 A CN 109739894A CN 201910014080 A CN201910014080 A CN 201910014080A CN 109739894 A CN109739894 A CN 109739894A
- Authority
- CN
- China
- Prior art keywords
- metadata
- data
- field
- similarity
- supplement
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Landscapes
- Medical Treatment And Welfare Office Work (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of methods of supplement metadata description, this method comprises: obtaining the full dose metadata from each source database pre-saved;Data consanguinity analysis is carried out to the full dose metadata, the blood relationship map between different data table to construct the full dose metadata;The aiming field to additional notes is obtained from the full dose metadata, and determines object table belonging to the aiming field;The upstream table of the object table is determined according to the blood relationship map;The similarity for analyzing the metadata fields and the aiming field in the upstream table carries out additional notes to the aiming field according to the similarity.The invention also discloses device, equipment and a kind of storage mediums of a kind of supplement metadata description.The present invention realizes during supplementing metadata description, reduces manual intervention, reduces cost of labor, improves whole working efficiency.
Description
Technical field
The present invention relates to data Treatment process field, more particularly to the method, apparatus of supplement metadata description, equipment and deposit
Storage media.
Background technique
Metadata is the data for describing data, and the Lax entropy of metadata refers in descriptive metadata database
The information of field meanings, the information can auxiliary data user more fully understand data.In big data platform, deposit sometimes
The case where not creating the Lax entropy of metadata because of a variety of causes, can not have when this will lead to third party's number
Effect understands data, causes to perplex.
Existing supplement metadata fields describe main scheme, substantially carry out artificial participation by related development personnel and mend
Fill, specifically include two schemes: 1) developer modifies the metadata description information in source database by script, then passes through
It shows after metadata management system acquisition, processing with data side.This scheme needs to remodify source database table structure, meeting
Increase development process risk, higher cost, effect are slower.2) it is based on metadata management system, it is new in metadata management system
Increase metadata management module, supports for the description editor of meta-data level and modification.This scheme does not need to walk exploitation again
Process, on existing source system data library without influence, however its whole process is still to be supplemented by manually participating in intervening entirely description,
High labor cost and working efficiency is lower.
Summary of the invention
It is a primary object of the present invention to propose method, apparatus, equipment and the storage medium of a kind of supplement metadata description,
It aims at during supplementing metadata description, reduces manual intervention, reduce cost of labor, improve whole work effect
Rate.
To achieve the above object, the present invention provides a kind of method of supplement metadata description, the supplement metadata description
Method include the following steps:
Obtain the full dose metadata from each source database pre-saved;
To full dose metadata progress data consanguinity analysis, between the different data table to construct the full dose metadata
Blood relationship map;
The aiming field to additional notes is obtained from the full dose metadata, and determines mesh belonging to the aiming field
Mark table;
The upstream table of the object table is determined according to the blood relationship map;
The similarity for analyzing the metadata fields and the aiming field in the upstream table, according to the similarity to institute
It states aiming field and carries out additional notes.
Preferably, it is described obtain pre-save from the full dose metadata of each source database the step of before, also wrap
It includes:
Metadata is pulled from preset source database;
By the meta-data preservation pulled into preset metadata resource library.
Preferably, described that data consanguinity analysis is carried out to the full dose metadata, to construct the full dose metadata not
Include: with the step of blood relationship map between tables of data
The data table types of the full dose metadata are obtained, the data table types include big data platform tables of data and biography
System relational data table;
Preset blood relationship Hook Function is called to obtain the genetic connection between different big data platform tables of data, and
Genetic connection between the big data platform tables of data and the traditional relational tables of data;
Preset chart database engine is called, the blood relationship between the different big data platform tables of data that will acquire
Genetic connection between relationship and the big data platform tables of data and the traditional relational tables of data is built into blood relationship figure
Spectrum.
Preferably, the step of metadata fields and the similarity of the aiming field analyzed in the upstream table wrap
It includes:
Duplicate removal processing is carried out to the metadata fields in the upstream table;
Preset invalid field is rejected from the metadata fields after duplicate removal processing, obtains metadata word to be analyzed
Section collection;
According to preset algorithm, it is similar to the aiming field to calculate separately each field that the metadata fields are concentrated
Degree.
Preferably, described according to preset algorithm, calculate separately each field and the mesh that the metadata fields are concentrated
The step of similarity of marking-up section includes:
According to editing distance algorithm, calculates separately each field that the metadata fields are concentrated and be converted into the target word
The minimum edit operation times of Duan Suoxu;
The each field and the target word that the metadata fields are concentrated are calculated according to the minimum edit operation times
The similarity of section.
Preferably, described the step of carrying out additional notes to the aiming field according to the similarity, includes:
The determining highest field of similarity with the aiming field is concentrated from the metadata fields;
The acquisition Lax entropy with the highest field of similarity of the aiming field, will the described and mesh
Additional notes of the Lax entropy of the highest field of the similarity of marking-up section as the aiming field.
Preferably, described the step of carrying out additional notes to the aiming field according to the similarity, includes:
According to the sequence of the similarity from high to low, the field that the metadata fields are concentrated is arranged, is obtained
The additional notes selective listing of the aiming field;
When receiving the selection instruction that user is triggered based on the additional notes selective listing, determine that user is selected
Field;
The Lax entropy for obtaining the selected field of user believes the field description of the selected field of the user
Cease the additional notes as the aiming field.
In addition, to achieve the above object, the present invention also provides a kind of device of supplement metadata description, the first numbers of the supplement
Include: according to the device of description
First obtains module, for obtaining the full dose metadata from each source database pre-saved;
Module is constructed, for carrying out data consanguinity analysis to the full dose metadata, to construct the full dose metadata
Blood relationship map between different data table;
Second obtains module, for obtaining the aiming field to additional notes from the full dose metadata, and determines institute
State object table belonging to aiming field;
Determining module, for determining the upstream table of the object table according to the blood relationship map;
Additional notes module, for analyzing the similarity of metadata fields and the aiming field in the upstream table,
Additional notes are carried out to the aiming field according to the similarity.
In addition, to achieve the above object, the present invention also provides a kind of equipment of supplement metadata description, the first numbers of the supplement
Equipment according to description includes: memory, processor and is stored in the benefit that can be run on the memory and on the processor
The program of metadata description is filled, the program of the supplement metadata description realizes as described above mend when being executed by the processor
The step of filling the method for metadata description.
In addition, to achieve the above object, the present invention also provides a kind of storage medium, supplement is stored on the storage medium
The program of the program of metadata description, the supplement metadata description is realized when being executed by processor supplements first number as described above
According to description method the step of.
The present invention obtains the full dose metadata from each source database pre-saved;The full dose metadata is carried out
Data consanguinity analysis, the blood relationship map between different data table to construct the full dose metadata;From the full dose metadata
Aiming field of the middle acquisition to additional notes, and determine object table belonging to the aiming field;It is true according to the blood relationship map
The upstream table of the fixed object table;The similarity for analyzing the metadata fields and the aiming field in the upstream table, according to
The similarity carries out additional notes to the aiming field.This mode compared with the prior art, is realized according to different numbers
It is that aiming field is automatically replenished metadata description according to the genetic connection between table, to reduce manual intervention, reduces costs,
Improve whole working efficiency.
Detailed description of the invention
Fig. 1 is the device structure schematic diagram for the hardware running environment that the embodiment of the present invention is related to;
Fig. 2 is the flow diagram of the method first embodiment of present invention supplement metadata description;
Fig. 3 is the schematic diagram that metadata management system pulls metadata from source database in the embodiment of the present invention;
Fig. 4 is the refinement step schematic diagram of step S20 in Fig. 2;
Fig. 5 is the relation schematic diagram in the embodiment of the present invention between object table and upstream table;
Fig. 6 is that the analysis result for carrying out similarity analysis to the aiming field Cust_name in Fig. 5 object table table1 is shown
It is intended to.
The embodiments will be further described with reference to the accompanying drawings for the realization, the function and the advantages of the object of the present invention.
Specific embodiment
It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, it is not intended to limit the present invention.
As shown in Figure 1, Fig. 1 is the device structure schematic diagram for the hardware running environment that the embodiment of the present invention is related to.
The equipment of supplement metadata description of the embodiment of the present invention can be PC machine or server apparatus.
As shown in Figure 1, the equipment may include: processor 1001, such as CPU, network interface 1004, user interface
1003, memory 1005, communication bus 1002.Wherein, communication bus 1002 is for realizing the connection communication between these components.
User interface 1003 may include display screen (Display), input unit such as keyboard (Keyboard), optional user interface
1003 can also include standard wireline interface and wireless interface.Network interface 1004 optionally may include that the wired of standard connects
Mouth, wireless interface (such as WI-FI interface).Memory 1005 can be high speed RAM memory, be also possible to stable memory
(non-volatile memory), such as magnetic disk storage.Memory 1005 optionally can also be independently of aforementioned processor
1001 storage device.
It will be understood by those skilled in the art that device structure shown in Fig. 1 does not constitute the restriction to equipment, can wrap
It includes than illustrating more or fewer components, perhaps combines certain components or different component layouts.
As shown in Figure 1, as may include that operating system, network are logical in a kind of memory 1005 of computer storage medium
Believe the program of module, Subscriber Interface Module SIM and supplement metadata description.
In equipment shown in Fig. 1, network interface 1004 is mainly used for connecting data user, carries out with data user
Data communication;User interface 1003 is mainly used for connecting client (user terminal), carries out data communication with client;And processor
1001 programs that can be used for that the supplement metadata stored in memory 1005 is called to describe, and execute following supplement metadata and retouch
Operation in each embodiment of the method stated.
Based on above-mentioned hardware configuration, each embodiment of method of present invention supplement metadata description is proposed.
Referring to the flow diagram for the method first embodiment that Fig. 2, Fig. 2 are present invention supplement metadata description, the side
Method includes:
Step S10 obtains the full dose metadata from each source database pre-saved;
The method that the present embodiment supplements metadata description is applied to the metadata management system in big data platform, this yuan of number
It is equipped on according to management system in the equipment of above-mentioned supplement metadata description, is mainly used for acquiring metadata from source database, and
Collected metadata is supplied to metadata application, is used for data user.
In metadata management system, the full dose metadata from each source database is pre-saved.Above-mentioned steps S10
It before, may include: to pull metadata from preset source database;By the meta-data preservation pulled to preset member
In data repository, the full dose member from each source database pre-saved is obtained in metadata resource library in order to subsequent
Data.
Referring to Fig. 3, Fig. 3 is the signal that metadata management system pulls metadata from source database in the embodiment of the present invention
Figure.Wherein, metadata management system includes metadata acquisition module and metadata resource library, and source database includes big data platform
Database (such as Hive, Hbase) and distributed traditional Relational DataBase (such as MySQL, Oracle etc.).Metadata acquisition mould
Block can by ETL (Extract-Transform-Load, for describe by data from source terminal through extraction (extract),
The process of interaction conversion (transform), load (load) to destination) periodically from big data platform database and conventional relationship
Metadata is pulled in type database, then by the meta-data preservation pulled into metadata resource library.
It should be noted that the metadata pulled from source database includes but is not limited to be used to describe source system (to refer to
The corresponding system of each source database) data-base cluster information, database information, data table information and field information first number
According to.Wherein, the data-base cluster information of source system may include bulk encoding, group type, data source types, place Logic Networks
Network region, place physical network area, IP address, network port etc.;The database information of source system may include database ID,
Database name, data source types, bulk encoding, library description information, exploitation responsible person etc.;The data table information of source system can wrap
Include table id, table name, library ID, library name, table description, exploitation responsible person, modification time etc.;The field information of source system can wrap
Include Field ID, field name, table name, serial number, data type, length, whether can for it is empty, whether index, whether subregion, word
Segment description etc..
In the present embodiment, obtaining the full dose metadata from each source database pre-saved first specifically can
To obtain the full dose metadata from each source database pre-saved from above-mentioned metadata resource library.
Step S20 carries out data consanguinity analysis to the full dose metadata, to construct the different numbers of the full dose metadata
According to the blood relationship map between table;
Full dose metadata includes several different tables of data, in the step, carries out data consanguinity analysis to full dose metadata,
Refer to the data relationship between analytical data, source (upstream blood relationship) and data flow (downstream blood including data in table
Edge), finally, the blood relationship map being configured between the different data table of full dose metadata.
It specifically, is the refinement step schematic diagram of step S20 in Fig. 2 referring to Fig. 4, Fig. 4.Above-mentioned steps S20 can be into one
Step includes:
Step S21, obtains the data table types of the full dose metadata, and the data table types include big data platform number
According to table and traditional relational tables of data;
Step S22 calls preset blood relationship Hook Function to obtain the blood relationship between different big data platform tables of data
Genetic connection between relationship and the big data platform tables of data and the traditional relational tables of data;
Step S23 calls preset chart database engine, the different big data platform tables of data that will acquire it
Between genetic connection and the big data platform tables of data and the traditional relational tables of data between genetic connection building
At blood relationship map.
When it is implemented, obtaining the data table types of full dose metadata first, which may include deriving from
The big data platform tables of data of big data platform database and traditional relational tables of data from traditional Relational DataBase.
Then, call preset blood relationship Hook Function (Lineage Hook) to obtain different big data platform tables of data
Between genetic connection and the big data platform tables of data and the traditional relational tables of data between genetic connection.
Stream compression processing is realized at present mainly by Hive (Tool for Data Warehouse based on Hadoop), and Spark is (a kind of general
Big data quick processing engine) and Sqoop (one between Hadoop and relational database server for transmitting data
Tool) three kinds of modes, therefore capture blood relationship and can mainly pass through parsing hive sql, spark sql and sqoop implementation procedure
It realizes:
a)Hive Lineage Hook.The blood relationship Hook Function of Hive executes SQL by asynchronous capture Hive
(Structured Query Language, structured query language) sentence calls the Hive process performing analysis independently realized
API (Application Programming Interface, application programming interface) obtain SQL input data information,
Output data information and associated mission bit stream;
b)Spark-SQL Lineage Hook.The blood relationship Hook Function of Spark is held by asynchronous acquisition Spark-SQL
Capable SQL statement calls the Spark SQL process performing independently realized analysis API to obtain the input data information of SQL, output
Data information and associated mission bit stream;
c)Sqoop Lineage Hook.The blood relationship Hook Function of Sqoop, by it is asynchronous capture Sqoop execution order,
The exectorial parameter of Sqoop is analyzed, exectorial input data and output data relevant information and associated are obtained
The information of business.
The corresponding Lineage Hook of Hive and Spark-SQL is used to capture between different big data platform tables of data
Genetic connection, it is optional that genetic connection, Sqoop Lineage Hook are obtained by one or both of them blood relationship Hook Function
For capturing the genetic connection between big data platform tables of data and traditional relational tables of data.In the present embodiment, passing through
After above-mentioned each blood relationship Hook Function gets genetic connection, the timer-triggered scheduler system at regular intervals of big data platform can also be passed through
The data computation module (MapReduce) of big data platform is triggered to process the blood relationship data obtained daily, the data of processing are most
It is stored in Hive table eventually, wherein the every blood relationship packet processed contains: the information such as source table, object table, the lattice of record
Formula can be with are as follows: srcTable [source table], dstTable [object table], extral_info [related information]).
Later, preset chart database engine is called by metadata management system, the different big datas that will acquire are flat
The genetic connection structure between genetic connection and big data platform tables of data and traditional relational tables of data between platform tables of data
Build up blood relationship map.Building process are as follows: the object table of a blood relationship information can be used as the source table of other blood relationship information, thus
A complete genetic connection chain is formed, different genetic connection chains forms complete genetic connection figure, i.e. blood relationship map.
Step S30 obtains the aiming field to additional notes from the full dose metadata, and determines the aiming field
Affiliated object table;
In the step, the aiming field to additional notes, specifically, metadata management are obtained from above-mentioned full dose metadata
System can be regular, or in the fetching instruction for receiving user, grabs from the full dose metadata pre-saved and not to describe
Field, or it is described as the field of messy code as the aiming field to additional notes, then, it is determined that tables of data belonging to aiming field
As object table.
Step S40 determines the upstream table of the object table according to the blood relationship map;
In the step, the upstream table of object table is determined according to the blood relationship map of above-mentioned building, i.e., all includes object table
The tables of data of data source.Since the field metadata in the upstream blood relationship of tables of data and this table has a more close relationship, one
As for data meaning will not shift and change because of data, can be with for certain field metadata not described
It is supplemented using the recommendation that the genetic connection of its upstream does the description of field metadata.
Step S50 analyzes the similarity of the metadata fields and the aiming field in the upstream table, according to the phase
Additional notes are carried out to the aiming field like degree.
Several metadata fields are generally comprised in the table of upstream, in the step, analyze each metadata word in the table of upstream
Then the similarity of section and aiming field carries out additional notes to aiming field according to similarity.For example, available upstream table
In field information with the highest field of aiming field similarity, using the field information as the additional notes of aiming field, or
Person obtains the field information for being greater than or equal to several fields of preset threshold with aiming field similarity in the table of upstream and recommends
User chooses additional notes of the field information as aiming field by user from this several field information, specific real
Shi Shikeling selects different additional notes modes.This side for carrying out additional notes to aiming field by field similarity
The accuracy rate of formula, additional notes is higher, and participates in the overall process description without developer, need to do last confirmation, significantly
Improve whole working efficiency.
The present embodiment obtains the full dose metadata from each source database pre-saved;To the full dose metadata into
Row data consanguinity analysis, the blood relationship map between different data table to construct the full dose metadata;From the full dose member number
Aiming field according to middle acquisition to additional notes, and determine object table belonging to the aiming field;According to the blood relationship map
Determine the upstream table of the object table;Analyze the similarity of the metadata fields and the aiming field in the upstream table, root
Additional notes are carried out to the aiming field according to the similarity.This mode compared with the prior art, is realized according to difference
Genetic connection between tables of data is that aiming field is automatically replenished metadata description, to reduce manual intervention, is reduced into
This, improves whole working efficiency.
Further, the method first embodiment based on present invention supplement metadata description, proposes that the present invention supplements first number
According to the method second embodiment of description.
In the present embodiment, the metadata fields and the similarity of the aiming field in the analysis upstream table
Step may include: to carry out duplicate removal processing to the metadata fields in the upstream table;From the metadata after duplicate removal processing
Preset invalid field is rejected in field, obtains metadata fields collection to be analyzed;According to preset algorithm, the member is calculated separately
The similarity of each field and the aiming field that data field is concentrated.
In the present embodiment, it is contemplated that the metadata fields in the table of upstream may repeat, for example a field may deposit simultaneously
It is in multiple tables of data, to improve computational efficiency, duplicate removal processing first can be carried out to the metadata fields in the table of upstream.Specifically
Ground is the relation schematic diagram in the embodiment of the present invention between object table and upstream table referring to Fig. 5, Fig. 5.Wherein table1 is target
Table, wherein include aiming field Cust_name, table2-table6 is the upstream table of table1, in table4 and table6 all
Including ID, therefore it can only retain an id field with duplicate removal.
Then, preset invalid field, such as id, field, mark etc. are rejected from the metadata fields after duplicate removal processing
The common field but without practical business meaning, finally obtains a metadata fields collection to be analyzed.Later, according to pre- imputation
Method calculates separately each field of metadata fields concentration and the similarity of aiming field.
In one embodiment, described according to preset algorithm, calculate separately each field that the metadata fields are concentrated
It may include: that the metadata fields are calculated separately according to editing distance algorithm with the step of similarity of the aiming field
The each field concentrated is converted into minimum edit operation times needed for the aiming field;According to the minimum edit operation time
Number calculates the similarity of each field that the metadata fields are concentrated and the aiming field.
Specifically, similarity analysis algorithm uses Levenshtein Distance (editing distance) algorithm, which is
Refer to that (edit operation includes replacing a character for the minimum edit operation times that take two character strings to be changed into needed for another as one
Change another character into, be inserted into a character, delete a character), editing distance is smaller, and the similarity of two character strings is got over
Greatly., can be according to editing distance algorithm in the present embodiment, each field for calculating separately metadata fields concentration is converted into target
Then minimum edit operation times needed for field calculate each word that metadata fields are concentrated according to minimum edit operation times
The similarity of section and aiming field.
Certainly, each field and target word that metadata fields are concentrated can also be calculated using other algorithms when specific implementation
The similarity of section, the present embodiment are not construed as limiting this.
It is point that similarity analysis is carried out to the aiming field Cust_name in Fig. 5 object table table1 referring to Fig. 6, Fig. 6
Analyse result schematic diagram.By similarity analysis, Cust_name, Custname, c_ that available metadata fields are concentrated
The similarity of name, prod_name, Prod_code and telephone and aiming field Cust_name are respectively 1,0.89,
0.67,0.56,0.22,0.11.
Similarity is being calculated and then additional notes are carried out to aiming field according to similarity.
In one embodiment, the step of additional notes are carried out to the aiming field according to similarity packet
It includes: concentrating the determining highest field of similarity with the aiming field from the metadata fields;Obtain the described and mesh
The Lax entropy of the highest field of the similarity of marking-up section, by the described and aiming field highest field of similarity
Additional notes of the Lax entropy as the aiming field.
For example, can be taken in Fig. 6 with the highest field Cust_name of the similarity of aiming field Cust_name, and from
The Lax entropy that Cust_name is obtained in corresponding upstream table, using the Lax entropy as aiming field Cust_
The additional notes of name.This mode realizes the Rapid matching of additional notes information.
In another embodiment, the step of additional notes are carried out to the aiming field according to similarity packet
It includes: according to the sequence of the similarity from high to low, the field that the metadata fields are concentrated being arranged, obtains the mesh
The additional notes selective listing of marking-up section;When the selection instruction for receiving user and being triggered based on the additional notes selective listing
When, determine the selected field of user;The Lax entropy of the selected field of user is obtained, the user is selected
Additional notes of the Lax entropy of field as the aiming field.
It, at this time can should for example, the field that metadata fields are concentrated in Fig. 6 is arranged according to similarity sequence from high to low
Additional notes selective listing shows front end user, and a field is therefrom selected by user, then, obtains from corresponding upstream table
The Lax entropy for taking the selected field in family, using the Lax entropy as the additional notes of aiming field.Pass through
This mode realizes user to the flexible choice of additional notes information, improves and carries out the flexible of additional notes to metadata
Property.
The present invention also provides a kind of devices of supplement metadata description.The embodiment of the present invention supplements the device of metadata description
Include:
First obtains module, for obtaining the full dose metadata from each source database pre-saved;
Module is constructed, for carrying out data consanguinity analysis to the full dose metadata, to construct the full dose metadata
Blood relationship map between different data table;
Second obtains module, for obtaining the aiming field to additional notes from the full dose metadata, and determines institute
State object table belonging to aiming field;
Determining module, for determining the upstream table of the object table according to the blood relationship map;
Additional notes module, for analyzing the similarity of metadata fields and the aiming field in the upstream table,
Additional notes are carried out to the aiming field according to the similarity.
Further, the device of the supplement metadata description further include:
Module is pulled, for pulling metadata from preset source database;
Preserving module, the meta-data preservation for that will pull is into preset metadata resource library.
Further, the building module includes:
Acquiring unit, for obtaining the data table types of the full dose metadata, the data table types include big data
Platform data table and traditional relational tables of data;
First call unit, for calling preset blood relationship Hook Function with obtain different big data platform tables of data it
Between genetic connection and the big data platform tables of data and the traditional relational tables of data between genetic connection;
Second call unit, for calling preset chart database engine, the different big data that will acquire is flat
The blood between genetic connection and the big data platform tables of data and the traditional relational tables of data between platform tables of data
Edge relationship is built into blood relationship map.
Further, the additional notes module is also used to:
Duplicate removal processing is carried out to the metadata fields in the upstream table;
Preset invalid field is rejected from the metadata fields after duplicate removal processing, obtains metadata word to be analyzed
Section collection;
According to preset algorithm, it is similar to the aiming field to calculate separately each field that the metadata fields are concentrated
Degree.
Further, the additional notes module is also used to:
According to editing distance algorithm, calculates separately each field that the metadata fields are concentrated and be converted into the target word
The minimum edit operation times of Duan Suoxu;
The each field and the target word that the metadata fields are concentrated are calculated according to the minimum edit operation times
The similarity of section.
Further, the additional notes module is also used to:
The determining highest field of similarity with the aiming field is concentrated from the metadata fields;
The acquisition Lax entropy with the highest field of similarity of the aiming field, will the described and mesh
Additional notes of the Lax entropy of the highest field of the similarity of marking-up section as the aiming field.
Further, the additional notes module is also used to:
According to the sequence of the similarity from high to low, the field that the metadata fields are concentrated is arranged, is obtained
The additional notes selective listing of the aiming field;
When receiving the selection instruction that user is triggered based on the additional notes selective listing, determine that user is selected
Field;
The Lax entropy for obtaining the selected field of user believes the field description of the selected field of the user
Cease the additional notes as the aiming field.
Operation performed by above-mentioned each program module can refer to the embodiment of the method for present invention supplement metadata description, herein
It repeats no more.
The present invention also provides a kind of storage mediums.
The program of supplement metadata description, the program quilt of the supplement metadata description are stored on storage medium of the present invention
The step of method of supplement metadata description as described above is realized when processor executes.
Wherein, the program of the supplement metadata description run on the processor, which is performed realized method, to join
According to each embodiment of method of present invention supplement metadata description, details are not described herein again.
It should be noted that, in this document, the terms "include", "comprise" or its any other variant are intended to non-row
His property includes, so that the process, method, article or the system that include a series of elements not only include those elements, and
And further include other elements that are not explicitly listed, or further include for this process, method, article or system institute it is intrinsic
Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including being somebody's turn to do
There is also other identical elements in the process, method of element, article or system.
The serial number of the above embodiments of the invention is only for description, does not represent the advantages or disadvantages of the embodiments.
Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side
Method can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but in many cases
The former is more preferably embodiment.Based on this understanding, technical solution of the present invention substantially in other words does the prior art
The part contributed out can be embodied in the form of software products, which is stored in one as described above
In storage medium (such as ROM/RAM, magnetic disk, CD), including some instructions are used so that terminal device (it can be mobile phone,
Computer, server, air conditioner or network equipment etc.) execute method described in each embodiment of the present invention.
The above is only a preferred embodiment of the present invention, is not intended to limit the scope of the invention, all to utilize this hair
Equivalent structure or equivalent flow shift made by bright specification and accompanying drawing content is applied directly or indirectly in other relevant skills
Art field, is included within the scope of the present invention.
Claims (10)
1. a kind of method of supplement metadata description, which is characterized in that the method for the supplement metadata description includes following step
It is rapid:
Obtain the full dose metadata from each source database pre-saved;
Data consanguinity analysis is carried out to the full dose metadata, the blood between different data table to construct the full dose metadata
Edge map;
The aiming field to additional notes is obtained from the full dose metadata, and determines target belonging to the aiming field
Table;
The upstream table of the object table is determined according to the blood relationship map;
The similarity for analyzing the metadata fields and the aiming field in the upstream table, according to the similarity to the mesh
Marking-up section carries out additional notes.
2. the method for supplement metadata description as described in claim 1, which is characterized in that described to obtain the source pre-saved
Before in the full dose metadata of each source database the step of, further includes:
Metadata is pulled from preset source database;
By the meta-data preservation pulled into preset metadata resource library.
3. as described in claim 1 supplement metadata description method, which is characterized in that it is described to the full dose metadata into
The step of row data consanguinity analysis, blood relationship map between the different data table to construct the full dose metadata includes:
The data table types of the full dose metadata are obtained, the data table types include that big data platform tables of data and tradition are closed
It is type tables of data;
Preset blood relationship Hook Function is called to obtain genetic connection between different big data platform tables of data and described
Genetic connection between big data platform tables of data and the traditional relational tables of data;
Preset chart database engine is called, the blood relationship between the different big data platform tables of data that will acquire is closed
System and the genetic connection between the big data platform tables of data and the traditional relational tables of data are built into blood relationship figure
Spectrum.
4. the method for supplement metadata description as claimed any one in claims 1 to 3, which is characterized in that the analysis institute
The step of stating the metadata fields and the similarity of the aiming field in the table of upstream include:
Duplicate removal processing is carried out to the metadata fields in the upstream table;
Preset invalid field is rejected from the metadata fields after duplicate removal processing, obtains metadata fields to be analyzed
Collection;
According to preset algorithm, the similarity of each field and the aiming field that the metadata fields are concentrated is calculated separately.
5. the method for supplement metadata description as claimed in claim 4, which is characterized in that it is described according to preset algorithm, respectively
The step of calculating the similarity of each field and the aiming field that the metadata fields are concentrated include:
According to editing distance algorithm, calculates separately each field that the metadata fields are concentrated and be converted into the aiming field institute
The minimum edit operation times needed;
The each field and the aiming field that the metadata fields are concentrated are calculated according to the minimum edit operation times
Similarity.
6. as claimed in claim 4 supplement metadata description method, which is characterized in that it is described according to the similarity to institute
Stating the step of aiming field carries out additional notes includes:
The determining highest field of similarity with the aiming field is concentrated from the metadata fields;
The acquisition Lax entropy with the highest field of similarity of the aiming field, will the described and target word
Additional notes of the Lax entropy of the highest field of similarity of section as the aiming field.
7. as claimed in claim 4 supplement metadata description method, which is characterized in that it is described according to the similarity to institute
Stating the step of aiming field carries out additional notes includes:
According to the sequence of the similarity from high to low, the field that the metadata fields are concentrated is arranged, is obtained described
The additional notes selective listing of aiming field;
When receiving the selection instruction that user is triggered based on the additional notes selective listing, the selected word of user is determined
Section;
The Lax entropy for obtaining the selected field of user makees the Lax entropy of the selected field of the user
For the additional notes of the aiming field.
8. it is a kind of supplement metadata description device, which is characterized in that it is described supplement metadata description device include:
First obtains module, for obtaining the full dose metadata from each source database pre-saved;
Module is constructed, for carrying out data consanguinity analysis to the full dose metadata, to construct the difference of the full dose metadata
Blood relationship map between tables of data;
Second obtains module, for obtaining the aiming field to additional notes from the full dose metadata, and determines the mesh
Object table belonging to marking-up section;
Determining module, for determining the upstream table of the object table according to the blood relationship map;
Additional notes module, for analyzing the similarity of metadata fields and the aiming field in the upstream table, according to
The similarity carries out additional notes to the aiming field.
9. a kind of equipment of supplement metadata description, which is characterized in that the equipment of the supplement metadata description includes: storage
Device, processor and the program for being stored in the supplement metadata description that can be run on the memory and on the processor, institute
State the supplement realized as described in any one of claims 1 to 7 when the program that supplement metadata describes is executed by the processor
The step of method of metadata description.
10. a kind of storage medium, which is characterized in that the program of supplement metadata description is stored on the storage medium, it is described
The program of supplement metadata description realizes the supplement metadata as described in any one of claims 1 to 7 when being executed by processor
The step of method of description.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910014080.9A CN109739894B (en) | 2019-01-04 | 2019-01-04 | Method, device, equipment and storage medium for supplementing metadata description |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910014080.9A CN109739894B (en) | 2019-01-04 | 2019-01-04 | Method, device, equipment and storage medium for supplementing metadata description |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109739894A true CN109739894A (en) | 2019-05-10 |
CN109739894B CN109739894B (en) | 2022-12-09 |
Family
ID=66363782
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910014080.9A Active CN109739894B (en) | 2019-01-04 | 2019-01-04 | Method, device, equipment and storage medium for supplementing metadata description |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109739894B (en) |
Cited By (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110347882A (en) * | 2019-06-27 | 2019-10-18 | 北京明略软件系统有限公司 | Consanguinity analysis method and device, storage medium and the electronic device of data |
CN110399423A (en) * | 2019-07-24 | 2019-11-01 | 北京明略软件系统有限公司 | Processing method and processing device, storage medium and the electronic device of metadata genetic connection |
CN110442604A (en) * | 2019-07-11 | 2019-11-12 | 新华三大数据技术有限公司 | Data flow querying method, abstracting method, processing method and relevant apparatus |
CN110647564A (en) * | 2019-08-14 | 2020-01-03 | 中国平安财产保险股份有限公司 | Hive table establishing method, electronic device and computer readable storage medium |
CN110827934A (en) * | 2019-08-19 | 2020-02-21 | 医渡云(北京)技术有限公司 | CRF (conditional random access memory) monitoring method and device |
CN110968592A (en) * | 2019-12-06 | 2020-04-07 | 深圳前海环融联易信息科技服务有限公司 | Metadata acquisition method and device, computer equipment and computer-readable storage medium |
CN111026736A (en) * | 2019-12-13 | 2020-04-17 | 中盈优创资讯科技有限公司 | Data blood margin management method and device and data blood margin analysis method and device |
CN111046242A (en) * | 2019-11-27 | 2020-04-21 | 支付宝(杭州)信息技术有限公司 | Data processing method, device, equipment and medium |
CN111125229A (en) * | 2019-12-24 | 2020-05-08 | 杭州数梦工场科技有限公司 | Data blood margin generation method and device and electronic equipment |
CN111488363A (en) * | 2020-06-28 | 2020-08-04 | 平安国际智慧城市科技股份有限公司 | Data processing method, device, electronic equipment and medium |
CN111782629A (en) * | 2020-06-23 | 2020-10-16 | 京东数字科技控股有限公司 | Feature processing script generation method and device |
CN112100201A (en) * | 2020-09-30 | 2020-12-18 | 东莞市盟大塑化科技有限公司 | Data monitoring method, device, equipment and storage medium based on big data technology |
CN112328599A (en) * | 2020-11-12 | 2021-02-05 | 杭州数梦工场科技有限公司 | Metadata-based field blood relationship analysis method and device |
CN112395287A (en) * | 2019-08-19 | 2021-02-23 | 北京国双科技有限公司 | Table classification method, table creation method, device, equipment and medium |
CN112711591A (en) * | 2020-12-31 | 2021-04-27 | 天云融创数据科技(北京)有限公司 | Data blood margin determination method and device based on field level of knowledge graph |
CN112817984A (en) * | 2021-02-22 | 2021-05-18 | 杭州数梦工场科技有限公司 | Data processing method and device, and data source obtaining method and device |
CN112988698A (en) * | 2019-12-02 | 2021-06-18 | 阿里巴巴集团控股有限公司 | Data processing method and device |
CN113111636A (en) * | 2021-05-17 | 2021-07-13 | 京东科技控股股份有限公司 | Data uniqueness standard identification method and device |
WO2021218021A1 (en) * | 2020-04-28 | 2021-11-04 | 平安科技(深圳)有限公司 | Data-based blood relationship analysis method, apparatus, and device and computer-readable storage medium |
CN113626438A (en) * | 2021-08-12 | 2021-11-09 | 深圳平安智汇企业信息管理有限公司 | Data table management method and device, computer equipment and storage medium |
WO2022143045A1 (en) * | 2020-12-30 | 2022-07-07 | 中兴通讯股份有限公司 | Method and apparatus for determining data blood relationship, and storage medium and electronic apparatus |
CN114840563A (en) * | 2021-02-01 | 2022-08-02 | 腾讯科技(深圳)有限公司 | Method, device, equipment and storage medium for generating field description information |
CN116484084A (en) * | 2023-06-21 | 2023-07-25 | 广州信安数据有限公司 | Metadata blood-margin analysis method, medium and system based on application information mining |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104169912A (en) * | 2012-03-27 | 2014-11-26 | 株式会社东芝 | Information processing terminal and method, and information management apparatus and method |
CN104424269A (en) * | 2013-08-30 | 2015-03-18 | 中国电信股份有限公司 | Data linage analysis method and device |
CN105787051A (en) * | 2016-02-26 | 2016-07-20 | 广州品唯软件有限公司 | Analysis method and device based on metadata model |
CN106709024A (en) * | 2016-12-28 | 2017-05-24 | 深圳市华傲数据技术有限公司 | Data table source-tracing method and device based on consanguinity analysis |
CN108197182A (en) * | 2017-12-25 | 2018-06-22 | 百味云科技股份有限公司 | A kind of data atlas analysis system and method |
-
2019
- 2019-01-04 CN CN201910014080.9A patent/CN109739894B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104169912A (en) * | 2012-03-27 | 2014-11-26 | 株式会社东芝 | Information processing terminal and method, and information management apparatus and method |
CN104424269A (en) * | 2013-08-30 | 2015-03-18 | 中国电信股份有限公司 | Data linage analysis method and device |
CN105787051A (en) * | 2016-02-26 | 2016-07-20 | 广州品唯软件有限公司 | Analysis method and device based on metadata model |
CN106709024A (en) * | 2016-12-28 | 2017-05-24 | 深圳市华傲数据技术有限公司 | Data table source-tracing method and device based on consanguinity analysis |
CN108197182A (en) * | 2017-12-25 | 2018-06-22 | 百味云科技股份有限公司 | A kind of data atlas analysis system and method |
Cited By (37)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110347882A (en) * | 2019-06-27 | 2019-10-18 | 北京明略软件系统有限公司 | Consanguinity analysis method and device, storage medium and the electronic device of data |
CN110442604A (en) * | 2019-07-11 | 2019-11-12 | 新华三大数据技术有限公司 | Data flow querying method, abstracting method, processing method and relevant apparatus |
CN110442604B (en) * | 2019-07-11 | 2022-03-11 | 新华三大数据技术有限公司 | Data flow direction query method, data flow direction extraction method, data flow direction processing method and related devices |
CN110399423A (en) * | 2019-07-24 | 2019-11-01 | 北京明略软件系统有限公司 | Processing method and processing device, storage medium and the electronic device of metadata genetic connection |
CN110647564A (en) * | 2019-08-14 | 2020-01-03 | 中国平安财产保险股份有限公司 | Hive table establishing method, electronic device and computer readable storage medium |
CN110647564B (en) * | 2019-08-14 | 2023-11-24 | 中国平安财产保险股份有限公司 | Hive table building method, electronic device and computer readable storage medium |
CN112395287A (en) * | 2019-08-19 | 2021-02-23 | 北京国双科技有限公司 | Table classification method, table creation method, device, equipment and medium |
CN110827934A (en) * | 2019-08-19 | 2020-02-21 | 医渡云(北京)技术有限公司 | CRF (conditional random access memory) monitoring method and device |
CN111046242A (en) * | 2019-11-27 | 2020-04-21 | 支付宝(杭州)信息技术有限公司 | Data processing method, device, equipment and medium |
CN111046242B (en) * | 2019-11-27 | 2023-09-26 | 支付宝(杭州)信息技术有限公司 | Data processing method, device, equipment and medium |
CN112988698A (en) * | 2019-12-02 | 2021-06-18 | 阿里巴巴集团控股有限公司 | Data processing method and device |
CN110968592A (en) * | 2019-12-06 | 2020-04-07 | 深圳前海环融联易信息科技服务有限公司 | Metadata acquisition method and device, computer equipment and computer-readable storage medium |
CN110968592B (en) * | 2019-12-06 | 2023-11-21 | 深圳前海环融联易信息科技服务有限公司 | Metadata acquisition method, metadata acquisition device, computer equipment and computer readable storage medium |
CN111026736B (en) * | 2019-12-13 | 2024-03-12 | 中盈优创资讯科技有限公司 | Data blood margin management method and device and data blood margin analysis method and device |
CN111026736A (en) * | 2019-12-13 | 2020-04-17 | 中盈优创资讯科技有限公司 | Data blood margin management method and device and data blood margin analysis method and device |
CN111125229A (en) * | 2019-12-24 | 2020-05-08 | 杭州数梦工场科技有限公司 | Data blood margin generation method and device and electronic equipment |
WO2021218021A1 (en) * | 2020-04-28 | 2021-11-04 | 平安科技(深圳)有限公司 | Data-based blood relationship analysis method, apparatus, and device and computer-readable storage medium |
CN111782629B (en) * | 2020-06-23 | 2024-05-17 | 京东科技控股股份有限公司 | Feature processing script generation method and device |
CN111782629A (en) * | 2020-06-23 | 2020-10-16 | 京东数字科技控股有限公司 | Feature processing script generation method and device |
CN111488363A (en) * | 2020-06-28 | 2020-08-04 | 平安国际智慧城市科技股份有限公司 | Data processing method, device, electronic equipment and medium |
CN111488363B (en) * | 2020-06-28 | 2020-10-02 | 平安国际智慧城市科技股份有限公司 | Data processing method, device, electronic equipment and medium |
CN112100201A (en) * | 2020-09-30 | 2020-12-18 | 东莞市盟大塑化科技有限公司 | Data monitoring method, device, equipment and storage medium based on big data technology |
CN112100201B (en) * | 2020-09-30 | 2024-02-06 | 东莞盟大集团有限公司 | Data monitoring method, device, equipment and storage medium based on big data technology |
CN112328599A (en) * | 2020-11-12 | 2021-02-05 | 杭州数梦工场科技有限公司 | Metadata-based field blood relationship analysis method and device |
WO2022143045A1 (en) * | 2020-12-30 | 2022-07-07 | 中兴通讯股份有限公司 | Method and apparatus for determining data blood relationship, and storage medium and electronic apparatus |
CN112711591B (en) * | 2020-12-31 | 2021-10-08 | 天云融创数据科技(北京)有限公司 | Data blood margin determination method and device based on field level of knowledge graph |
CN112711591A (en) * | 2020-12-31 | 2021-04-27 | 天云融创数据科技(北京)有限公司 | Data blood margin determination method and device based on field level of knowledge graph |
CN114840563A (en) * | 2021-02-01 | 2022-08-02 | 腾讯科技(深圳)有限公司 | Method, device, equipment and storage medium for generating field description information |
CN114840563B (en) * | 2021-02-01 | 2024-05-03 | 腾讯科技(深圳)有限公司 | Method, device, equipment and storage medium for generating field description information |
CN112817984B (en) * | 2021-02-22 | 2023-10-20 | 杭州数梦工场科技有限公司 | Data processing method and device, and data source acquisition method and device |
CN112817984A (en) * | 2021-02-22 | 2021-05-18 | 杭州数梦工场科技有限公司 | Data processing method and device, and data source obtaining method and device |
CN113111636B (en) * | 2021-05-17 | 2024-04-12 | 京东科技控股股份有限公司 | Data uniqueness standard identification method and device |
CN113111636A (en) * | 2021-05-17 | 2021-07-13 | 京东科技控股股份有限公司 | Data uniqueness standard identification method and device |
CN113626438A (en) * | 2021-08-12 | 2021-11-09 | 深圳平安智汇企业信息管理有限公司 | Data table management method and device, computer equipment and storage medium |
CN113626438B (en) * | 2021-08-12 | 2024-04-30 | 深圳平安智汇企业信息管理有限公司 | Data table management method, device, computer equipment and storage medium |
CN116484084B (en) * | 2023-06-21 | 2023-11-17 | 广州信安数据有限公司 | Metadata blood-margin analysis method, medium and system based on application information mining |
CN116484084A (en) * | 2023-06-21 | 2023-07-25 | 广州信安数据有限公司 | Metadata blood-margin analysis method, medium and system based on application information mining |
Also Published As
Publication number | Publication date |
---|---|
CN109739894B (en) | 2022-12-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109739894A (en) | Supplement method, apparatus, equipment and the storage medium of metadata description | |
US11526531B2 (en) | Dynamic field data translation to support high performance stream data processing | |
US8065326B2 (en) | System and method for building decision trees in a database | |
CN102033748B (en) | Method for generating data processing flow codes | |
Ediger et al. | Tracking structure of streaming social networks | |
US10102039B2 (en) | Converting a hybrid flow | |
US10747958B2 (en) | Dependency graph based natural language processing | |
CN104424018B (en) | Distributed Calculation transaction methods and device | |
US11003649B2 (en) | Index establishment method and device | |
KR101617696B1 (en) | Method and device for mining data regular expression | |
CN107784026A (en) | A kind of ETL data processing methods and device | |
US20150363467A1 (en) | Performing an index operation in a mapreduce environment | |
US20060235879A1 (en) | Techniques for specifying and collecting data aggregations | |
CN107256206A (en) | The method and apparatus of character stream format conversion | |
WO2023134061A1 (en) | Artificial intelligence-based method and apparatus for determining drug feature information | |
CN110149529A (en) | Processing method, server and the storage medium of media information | |
CN109902126B (en) | Loading system supporting HIVE automatic partition and implementation method thereof | |
CN114756629B (en) | Multi-source heterogeneous data interaction analysis engine and method based on SQL | |
CN108073582B (en) | Computing framework selection method and device | |
CN112182031B (en) | Data query method and device, storage medium and electronic device | |
CN105573726B (en) | A kind of rules process method and equipment | |
CN104462610B (en) | Distributed RDF storages and enquiring and optimizing method with reference to body | |
CN109543772A (en) | Data set automatic matching method, device, equipment and computer readable storage medium | |
CN110825453B (en) | Data processing method and device based on big data platform | |
CN114328606A (en) | Method, device and storage medium for improving SQL execution efficiency |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |