CN114218301A - Metadata-driven data polling and version management method and device and electronic equipment - Google Patents

Metadata-driven data polling and version management method and device and electronic equipment Download PDF

Info

Publication number
CN114218301A
CN114218301A CN202111617949.2A CN202111617949A CN114218301A CN 114218301 A CN114218301 A CN 114218301A CN 202111617949 A CN202111617949 A CN 202111617949A CN 114218301 A CN114218301 A CN 114218301A
Authority
CN
China
Prior art keywords
metadata
inspection
directory
external
data source
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111617949.2A
Other languages
Chinese (zh)
Inventor
刘新辉
康定
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Clinbrain Information Technology Co Ltd
Original Assignee
Shanghai Clinbrain Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Clinbrain Information Technology Co Ltd filed Critical Shanghai Clinbrain Information Technology Co Ltd
Priority to CN202111617949.2A priority Critical patent/CN114218301A/en
Publication of CN114218301A publication Critical patent/CN114218301A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2474Sequence data queries, e.g. querying versioned data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2477Temporal data queries

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention discloses a metadata-driven data polling and version management method, a device and electronic equipment, wherein the method comprises the following steps: respectively establishing communication connection with each target data source based on the target data source, establishing an initial metadata set according to each target data source, and establishing a metadata version number for each metadata in the initial metadata set; establishing a storage directory according to external metadata in the initial metadata set, and establishing directory version numbers for directory items in the storage directory; polling external metadata associated with each directory entry based on the first polling instruction and the target data source, and updating the external metadata, the metadata version number of the external metadata and the directory version number of the directory entry corresponding to the updated external metadata based on the polling result; and polling the residual metadata based on the second polling instruction and the target data source, and updating the residual metadata and the metadata version number of the residual metadata based on the polling result, so that the polling speed and pertinence are improved.

Description

Metadata-driven data polling and version management method and device and electronic equipment
Technical Field
The embodiment of the invention relates to the technical field of data inspection, in particular to a metadata-driven data inspection and version management method, a metadata-driven data inspection and version management device and electronic equipment.
Background
Data becomes important assets in various industries, the quality of the data directly relates to the accuracy of information, and normal operation of the industries and the integrity of the data are also influenced. Therefore, it is necessary to ensure the accuracy and uniformity of each data.
At present, the determination method for ensuring the accuracy and the uniformity of data is to carry out full inspection on the data regularly. However, because the amount of data stored in the system is large, performing full inspection results in slow speed and no pertinence, and it is difficult to determine the change of the data after data inspection.
In medical data, a large amount of data such as backup data, intermediate data, temporary data and the like exist, and the full inspection of the data wastes resources and has no practical business significance. Therefore, meaningful data cannot be inspected and updated in a targeted manner by the full inspection mode. Moreover, a unified and convenient access mode is lacked for data sources of different medical data, and the versions of the data acquired after access cannot be subjected to standard management and display.
Disclosure of Invention
The embodiment of the invention provides a metadata-driven data polling and version management method, a metadata-driven data polling and version management device and electronic equipment, and aims to achieve the technical effects of improving the data polling speed and pertinence and recording differences in data polling.
In a first aspect, an embodiment of the present invention provides a metadata-driven data polling and version management method, where the method includes:
respectively establishing communication connection with each target data source based on at least one target data source, establishing an initial metadata set according to each target data source, and establishing a metadata version number for each metadata in the initial metadata set; wherein the initial metadata set comprises external metadata and residual metadata;
establishing a storage directory according to external metadata in the initial metadata set, and establishing directory version numbers for directory entries in the storage directory;
when a first inspection instruction is received, inspecting external metadata associated with each directory entry in the storage directory based on the first inspection instruction and the target data source, and updating the external metadata, the metadata version number of the external metadata and the directory version number of the directory entry corresponding to the updated external metadata based on inspection results;
when a second polling instruction is received, polling is conducted on the residual metadata in the initial metadata set based on the second polling instruction and the target data source, and the residual metadata and the metadata version number of the residual metadata are updated based on a polling result.
In a second aspect, an embodiment of the present invention further provides a metadata-driven data inspection and version management apparatus, where the apparatus includes:
the system comprises an initial metadata set establishing module, a metadata version number establishing module and a metadata version number establishing module, wherein the initial metadata set establishing module is used for respectively establishing communication connection with each target data source based on at least one target data source, establishing an initial metadata set according to each target data source and establishing a metadata version number for each metadata in the initial metadata set; wherein the initial metadata set comprises external metadata and residual metadata;
the storage directory establishing module is used for establishing a storage directory according to the external metadata in the initial metadata set and establishing directory version numbers for directory items in the storage directory;
the first inspection module is used for inspecting external metadata associated with each directory entry in the storage directory based on the first inspection instruction and the target data source when a first inspection instruction is received, and updating the external metadata, the metadata version number of the external metadata and the directory version number of the directory entry corresponding to the updated external metadata based on an inspection result;
and the second inspection module is used for inspecting the residual metadata in the initial metadata set based on the second inspection instruction and the target data source when receiving a second inspection instruction, and updating the residual metadata and the metadata version number of the residual metadata based on an inspection result.
In a third aspect, an embodiment of the present invention further provides an electronic device, where the electronic device includes:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement a metadata driven data routing and version management method according to any one of the embodiments of the invention.
In a fourth aspect, the present invention further provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the metadata-driven data inspection and version management method according to any one of the embodiments of the present invention.
According to the technical scheme of the embodiment of the invention, communication connection with each target data source is respectively established based on at least one target data source, an initial metadata set is established according to each target data source, a metadata version number is established for each metadata in the initial metadata set, so that metadata in each target data source is acquired and obtained, and the metadata version number is distributed. And then, establishing a storage directory according to the external metadata in the initial metadata set, and establishing directory version numbers for all directory entries in the storage directory so as to establish the storage directory and distribute the directory version numbers. When a first inspection instruction is received, inspecting external metadata associated with each directory entry in the storage directory based on the first inspection instruction and a target data source, and updating the external metadata, the metadata version number of the external metadata and the directory version number of the directory entry corresponding to the updated external metadata based on an inspection result; when a second inspection instruction is received, based on the second inspection instruction and a target data source, inspection is conducted on the residual metadata in the initial metadata set, the residual metadata and the metadata version number of the residual metadata are updated based on inspection results, the problems that due to the fact that the number of target data sources is large, the data inspection data volume is large, time consumption is long, and data differences before and after data inspection are difficult to determine are solved, the data elements are divided through directory entries, data inspection speed and pertinence are improved, and the differences in data inspection are recorded.
Drawings
In order to more clearly illustrate the technical solutions of the exemplary embodiments of the present invention, a brief description is given below of the drawings used in describing the embodiments. It should be clear that the described figures are only views of some of the embodiments of the invention to be described, not all, and that for a person skilled in the art, other figures can be derived from these figures without inventive effort.
Fig. 1 is a flowchart illustrating a metadata-driven data inspection and version management method according to an embodiment of the present invention;
fig. 2 is a flowchart illustrating a metadata-driven data inspection and version management method according to a second embodiment of the present invention;
fig. 3 is a schematic structural diagram of a metadata-driven data inspection and version management system according to a second embodiment of the present invention;
fig. 4 is a schematic flowchart illustrating a process of examining a processing operation performed on a polling result according to a second embodiment of the present invention;
fig. 5 is a schematic structural diagram of a metadata-driven data inspection and version management apparatus according to a third embodiment of the present invention;
fig. 6 is a schematic structural diagram of an electronic device according to a fourth embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
Example one
Fig. 1 is a flowchart of a metadata-driven data inspection and version management method according to an embodiment of the present invention, where the present embodiment is applicable to data inspection and version management of metadata in an accessed data source, and the method may be executed by a metadata-driven data inspection and version management apparatus, and the apparatus may be implemented in a form of software and/or hardware, where the hardware may be an electronic device, and optionally, the electronic device may be a mobile terminal, a PC terminal, a server, and the like.
As shown in fig. 1, the method of this embodiment specifically includes the following steps:
s110, respectively establishing communication connection with each target data source based on at least one target data source, establishing an initial metadata set according to each target data source, and establishing a metadata version number for each metadata in the initial metadata set. The target data source may be various data sources corresponding to data integration, and is a device or a raw medium providing required data, such as a database, a data server, and the like. The initial set of metadata may be a data set comprised of metadata in the respective target data sources. The metadata may be data of data such as: and metadata information such as databases, tables, columns, indexes, storage processes and views can provide information support for business functions such as data quality management and the like through metadata. The metadata version number may be mutually independent version numbers established for different metadata. The initial metadata set includes circumscribed metadata and remaining metadata. The external metadata can be metadata that each external platform needs to obtain, and the external platform can be understood as a third-party service platform, and the like, for example: a data quality platform, a Business Intelligence (BI) platform, etc. The remaining metadata may be metadata in the initial metadata set excluding the metadata that circumscribes the remainder of the metadata, such as backup data, intermediate data, temporary data, and the like.
Specifically, at least one target data source needing to be connected is determined according to actual requirements, and communication connection is established between the target data sources to obtain all metadata in the target data sources. And integrating all the obtained metadata to obtain an initial metadata set, and establishing a metadata version number for each metadata in the initial metadata set. At this time, the metadata version number may be an initial metadata version number, for example, v1.0 or the like.
For example, if the target data source is a library a of the hand anesthesia in the hospital, an initial metadata set may be generated by connecting the library a through the data acquisition adaptation module and acquiring metadata, and an initial metadata version number, such as v1.0, is assigned to each metadata in the initial metadata set.
S120, establishing a storage directory according to the external metadata in the initial metadata set, and establishing directory version numbers for all directory items in the storage directory.
The storage directory may be a metadata directory constructed according to different hierarchies of metadata, where the hierarchies include, but are not limited to, data fields, data sources, data types, and the like. The directory entry may be the contents of each level of directory in the storage directory. The directory version numbers may be mutually independent version numbers established for different directory entries.
Specifically, external metadata are determined from the initial metadata set according to the external platform, and the storage directory is constructed according to the external metadata in a certain hierarchical structure. Further, directory version numbers may be established separately for different directory entries in the storage directory. At this time, the directory version number may be an initial directory version number, for example, v1.0 or the like.
For example, the data corresponding to each directory entry in the storage directory refers to the data set acquired by the self-acquisition adaptation module, and the data in the data set may be divided according to the directory entry, for example: the flow direction is divided according to the processing flow direction, namely the data lake, the data center, the data field and the data supermarket, the flow direction can be dynamically added and deleted according to medical requirements, and the directory entries determined according to the flow direction can be further divided logically to obtain the following directory entries, for example: and the library-table-field corresponds to the directory entry of the three-level hierarchy. It is understood that the data elements belong to the field level. Attributes of a data element include, but are not limited to, the logical layer to which it belongs, the data type, the type length, the allowed value, etc., with a version number for each data element. Likewise, there are attributes and version numbers for libraries, tables, such as: the library type, library connection string, master-slave library and other attributes of the library, and the table type, table resource identification, table description and other attributes of the table. When the follow-up data inspection detects that any one attribute is modified, a new version number and a new version history record are correspondingly generated.
For example, for data allowed to be introduced into the storage directory in the initial metadata set, the version number of the data referenced in the storage directory may directly inherit the version number obtained by the acquisition adaptation (the version number is not allowed to be unaffected), and if the data in the storage directory changes (including but not limited to human changes and patrol modification changes), the version number of the changed data element in the acquisition adaptation module may change according to the number of modifications, for example: and the modified version number can be directly referred to by the version number of the data corresponding to the storage directory by modifying v1.1, v1.2 and the like.
S130, when the first polling instruction is received, polling is conducted on the external metadata associated with each directory entry in the storage directory based on the first polling instruction and the target data source, and the external metadata, the metadata version number of the external metadata and the directory version number of the directory entry corresponding to the updated external metadata are updated based on the polling result.
The first polling instruction may be a code or a command for performing data polling on each external metadata associated with the storage directory. The polling result may be a result of determining whether the data has a difference after polling the data, for example: the inspection result may include the presence and absence of a difference, and the like.
Specifically, when a first routing inspection instruction is received, a data routing inspection operation based on the storage directory can be triggered based on the first routing inspection instruction, and data routing inspection is performed on external metadata corresponding to each directory entry in the storage directory according to each target data source to obtain a routing inspection result. If the polling result is that there is a difference, the external metadata with the difference is updated, and the metadata version number corresponding to the external metadata is updated, for example: update from v1.0 to v2.0, etc. Further, the update of the external metadata causes a change in the associated content of the directory entry corresponding to the updated external metadata, and thus, the directory version number of the directory entry corresponding to the updated external metadata is updated.
For example, the inspection results of data inspection for the library may include consistent results and inconsistent results. Wherein, consistency indicates that all table contents under the library are consistent; an inconsistency indicates that there is at least one table distinction under the library. The polling results of data polling on the table or the word can include consistency, inconsistency, addition and deletion.
And S140, when the second inspection instruction is received, inspecting the residual metadata in the initial metadata set based on the second inspection instruction and the target data source, and updating the residual metadata and the metadata version number of the residual metadata based on the inspection result.
The second polling instruction may be a code or a command for performing data polling on each remaining metadata other than the storage directory.
Specifically, when the second polling instruction is received, the data polling operation of the remaining metadata can be triggered based on the second polling instruction, and data polling is performed on the remaining metadata according to the target data sources to obtain polling results. If the polling result shows that the difference exists, updating the remaining metadata with the difference, and updating the metadata version number corresponding to the remaining metadata, for example: update from v1.0 to v2.0, etc.
In a medical scene, the technical scheme of the embodiment can solve the problem that the total inspection consumes long time due to large data volume of an original service library such as a HIS (Hospital Information System)/LIS (Laboratory Information Management System). By classifying the Data according to different directory entries, for example, according to the business logic layers, a corresponding business library such as RDR (Research Data Repository)/CDR (Clinical Data Repository) is provided under each business logic layer, and further, the Data corresponding to the storage directory is determined according to the classification result. In the using process, hospital information department personnel or department doctors can select the required storage catalogue according to the actual business requirements so as to carry out data verification or other operations in a targeted manner.
According to the technical scheme of the embodiment of the invention, communication connection with each target data source is respectively established based on at least one target data source, an initial metadata set is established according to each target data source, a metadata version number is established for each metadata in the initial metadata set, so that metadata in each target data source is acquired and obtained, and the metadata version number is distributed. And then, establishing a storage directory according to the external metadata in the initial metadata set, and establishing directory version numbers for all directory entries in the storage directory so as to establish the storage directory and distribute the directory version numbers. When a first inspection instruction is received, inspecting external metadata associated with each directory entry in the storage directory based on the first inspection instruction and a target data source, and updating the external metadata, the metadata version number of the external metadata and the directory version number of the directory entry corresponding to the updated external metadata based on an inspection result; when a second inspection instruction is received, based on the second inspection instruction and a target data source, inspection is conducted on the residual metadata in the initial metadata set, the residual metadata and the metadata version number of the residual metadata are updated based on an inspection result, the problems that data inspection is large in data size, long in time consumption and difficult to determine data difference before and after data inspection are solved, the data inspection speed and pertinence are improved, and the technical effect of recording the difference in data inspection is achieved.
Example two
Fig. 2 is a schematic flowchart of a metadata-driven data polling and version management method according to a second embodiment of the present invention, and on the basis of the foregoing embodiments, the present embodiment may refer to the technical solution of the present embodiment for a storage directory establishment manner and a data polling specific manner. The same or corresponding terms as those in the above embodiments are not explained in detail herein.
As shown in fig. 2, the method of this embodiment specifically includes the following steps:
s210, respectively establishing communication connection with each target data source based on at least one target data source, establishing an initial metadata set according to each target data source, and establishing a metadata version number for each metadata in the initial metadata set.
Specifically, a communication connection is established with at least one target data source according to actual requirements to acquire all metadata in the target data source. And integrating all the obtained metadata to obtain an initial metadata set, and establishing a metadata version number for each metadata in the initial metadata set.
Optionally, the target data source may be determined and a communication connection may be established with the target data source by:
determining at least one target data source according to data source information in the data source connection information; and establishing communication connection with at least one target data source according to the configuration information in the data source connection information.
The data source connection information may be information indicating that a data source is connected. The data source information may be various basic information identifying the data source, such as: data source name, data source identification, storage location, etc. The configuration information may include an account number, a password, an IP (Internet Protocol) address, etc. required to connect the data source.
Specifically, the target data source may be located according to the data source name, the data source identifier, the storage location, and the like of the target data source mentioned in the data source connection information. And further, matching and logging in according to the configuration information in the target data source connection information and the positioned target data source, and establishing communication connection so as to obtain data in the target data source subsequently.
Alternatively, the initial set of metadata may be established and assigned a metadata version number by:
the method comprises the steps of firstly, collecting metadata in each target data source, carrying out data inspection on the metadata according to preset inspection items and each target data source, and establishing an initial metadata set according to the metadata after data inspection.
The polling configuration can be used for carrying out data polling on the collected metadata according to data in the target data source, and the preset polling item comprises at least one of a data type, a data length, a data element definition and a metadata model. The patrol inspection content may include: abnormal database connection, database addition tables, database deletion tables, data table addition fields, data table deletion fields, field attribute changes and the like.
Specifically, metadata are acquired from each target data source, and the acquired metadata are subjected to full data inspection to ensure the integrity and accuracy of the loaded metadata. And then, polling the loaded metadata one by one according to the preset polling items, and integrating the metadata after data polling after the metadata pass through data polling to establish an initial metadata set.
And step two, establishing an initial metadata version number corresponding to the metadata aiming at each metadata in the initial metadata set.
Wherein, the initial metadata version number may be a preset version number for identifying the original metadata, such as: v1.0, etc.
Specifically, an initial metadata version number corresponding to each metadata is established for each metadata in the initial metadata set to identify that the metadata is obtained by initial collection
The metadata of (1). If the metadata is updated subsequently, the metadata version number may be updated from v1.0 to v2.0, from v1.1 to v1.2, and so on.
And S220, determining external metadata corresponding to the external platform from the initial metadata set.
Wherein, the external platform may be a platform using data in the initial metadata set, for example: a service platform, etc.
Specifically, according to the data requirements of each external platform, metadata matched with the data requirements are determined from the initial metadata set and serve as the external metadata.
And S230, dividing each external metadata according to the data domain corresponding to each external metadata.
The data field may be various stages in the data stream, such as: data Lake (DL), Data Center (DC), Data Domain (Data Domain), Data Market (Data Market), and the like. Illustratively, the DL can be used to deposit complete meta-data for external connections. The DC may be used to preliminarily integrate the circumscribed metadata in the DL. The data fields can be used for integrating the same data field in the DC together after processing the data according to different data fields and preset standards. The data marts may be used to store derivative data that is external to the metadata.
Specifically, according to the requirements of the external platform and the data flow direction, the external metadata can be divided according to different data fields, so as to match each directory entry in the storage directory in the following.
It should be noted that, dividing each external metadata according to the data field is one metadata dividing manner in this embodiment, and may also be divided according to different dividing manners, for example: project requirements, platform requirements, etc.
S240, determining each directory item of the storage directory according to the hierarchical structure of the data domain, establishing a corresponding relation between each external metadata and each directory item according to the division result of each external metadata, and establishing a directory version number for each directory item in the storage directory.
Specifically, directory entries of each hierarchy are constructed according to the hierarchical structure of the data domain, and the structure of the storage directory is constructed. And further, according to the dividing result of the external metadata, matching the external metadata with each directory entry, and establishing a corresponding relation between the external metadata and the directory entries. Also, each directory entry in the storage directory may be assigned an initial directory version number.
And S250, when the first polling instruction is received, polling is carried out on the external metadata associated with each directory entry in the storage directory based on the first polling instruction and the target data source, and the external metadata, the metadata version number of the external metadata and the directory version number of the directory entry corresponding to the updated external metadata are updated based on the polling result.
Specifically, when the first routing inspection instruction is received, data routing inspection is performed on external metadata corresponding to each directory entry in the storage directory according to each target data source, and a routing inspection result is obtained. And if the routing inspection result shows that the external metadata has the difference, updating the external metadata with the difference, updating the metadata version number corresponding to the external metadata, and updating the directory version number of the directory entry corresponding to the updated external metadata.
Optionally, the polling content may include: abnormal database connection, database addition tables, database deletion tables, data table addition fields, data table deletion fields, field attribute changes and the like.
Optionally, when the subsequent data is patrolled, the external metadata may be partially or completely updated according to the polling result, and the specific data selected by updating may be selected according to the requirement, which is not specifically limited in this embodiment.
Optionally, the data inspection with the storage directory as a reference point may be performed based on the first inspection instruction by:
the method comprises the steps of firstly, analyzing a first inspection instruction, and determining an inspection directory item corresponding to the first inspection instruction.
The patrol checking directory entry may be a directory entry to be patrolled and checked, which is carried in the first patrol checking instruction, and may be at least one directory entry.
Specifically, the first inspection instruction is analyzed, the inspection directory item carried by the first inspection instruction is determined from the analysis result, and data inspection is conducted on external metadata according to the inspection directory item.
And secondly, determining a data source to be inspected from the target data source according to the data blood margin of the external metadata corresponding to the inspection directory entry.
The data blood margin may be used to describe the source and destination of the data, and may mainly include: data source, data processing mode, mapping relation, data outlet and the like. The polling data source can be a target data source which needs to be matched when polling directory entries.
Illustratively, the data bloodline is used to trace back up the data source of a data object. For example: in the first case, there is a field named "length of stay" under the DOMAIN directory entry, which comes from the "time to admit" and "time to discharge" of the upper DCs. In the second case, the downstream data directly references the upstream data, such as the patient identification of the in-hospital primary service library, by the downstream data table.
Specifically, after the patrol inspection directory entry is determined, external metadata corresponding to the patrol inspection directory entry can be determined, and then according to the data blooding margin of the determined external metadata, the data source of the external metadata, namely the patrol inspection data source, is determined by tracing.
And thirdly, carrying out data inspection on external metadata corresponding to the inspection directory entry according to the data source to be inspected.
Specifically, metadata corresponding to the external metadata corresponding to the patrol directory entry is obtained from the data source to be patrolled. And carrying out data inspection on external metadata corresponding to the inspection directory entry according to the metadata acquired from the data source to be inspected so as to obtain an inspection result.
And step four, if the external metadata corresponding to the polling directory entry is updated according to the polling result, updating the metadata version number of the external metadata to be updated, generating an external metadata updating record according to the external metadata before updating and the external metadata after updating, and updating the directory version number of the polling directory entry corresponding to the external metadata to be updated.
The external metadata update record may be a record of external metadata before and after the update of the record.
Specifically, if the polling result is that a difference exists and the external metadata corresponding to the polling directory entry is updated, it indicates that the external metadata is changed, and the metadata version number of the updated external metadata can be updated. In order to record the change of the external metadata, the external metadata before updating and the external metadata after updating may be integrated to obtain an external metadata update record. For example: and combining the external metadata in the JSON format before updating and the external metadata in the JSON format after updating together to be used as the update record of the external metadata of the current updating. Since data polling is performed using the polling directory entry as a reference point, when the external metadata changes, the directory version number of the polling directory entry needs to be updated to indicate that the external metadata corresponding to the polling directory entry has changed.
For example, when the metadata version number of the external metadata is updated, it can be understood that: and taking the initial metadata version number of the external metadata as the current metadata version number, performing data inspection on the external metadata, updating the current metadata version number if the external metadata is updated according to the inspection result, taking the updated current metadata version number as a new current metadata version number, and returning to execute the operation of performing data inspection on the external metadata. It should be noted that, the update operations for the directory version number of the directory entry and the metadata version number of the remaining metadata mentioned later may also use a similar manner, and are not described herein again.
And S260, when the second inspection instruction is received, inspecting the residual metadata in the initial metadata set based on the second inspection instruction and the target data source, and updating the residual metadata and the metadata version number of the residual metadata based on the inspection result.
Specifically, when the second polling instruction is received, the data polling operation of the remaining metadata can be triggered based on the second polling instruction, and data polling is performed on the remaining metadata according to the target data sources to obtain polling results. And if the routing inspection result shows that the difference exists, updating the residual metadata with the difference, and updating the metadata version number corresponding to the residual metadata.
Optionally, data tour inspection with the remaining metadata as a reference point may be performed based on the second tour instruction by:
step one, determining a data source to be patrolled and examined from a target data source according to the residual metadata in the initial metadata set.
The polling data source may be a target data source that needs to be matched when polling the remaining metadata.
Specifically, for each remaining metadata in the initial metadata set, a data source to be patrolled corresponding to the remaining metadata is determined, for example: and determining a data source to be inspected from the target data source through the data blood margin of the residual metadata.
And step two, performing data inspection on the data to be inspected and the residual metadata according to the data source to be inspected.
Specifically, data information corresponding to the remaining metadata is collected from the data source to be patrolled and examined, and the determined data information is compared with the remaining metadata to carry out data patrolling and examining.
And step three, if the residual metadata are updated according to the routing inspection result, updating the metadata version number of the updated residual metadata, and generating a residual metadata updating record according to the residual metadata before updating and the residual metadata after updating.
Wherein the remaining metadata update record may be a record of remaining metadata before and after the record update.
Specifically, if the polling result shows that the difference exists and the remaining metadata is updated, it indicates that the remaining metadata is changed, and the metadata version number of the updated remaining metadata can be updated. In order to record the change of the remaining metadata, the remaining metadata before updating and the remaining metadata after updating may be integrated to obtain a remaining metadata update record. For example: and combining the residual metadata in the JSON format before updating and the residual metadata in the JSON format after updating to be used as the residual metadata updating record of the updating.
On the basis of the above embodiments, since the polling result shows that there is a difference, the metadata is updated, and there is a possibility that the metadata is frequently updated, and the metadata that is not necessary to be updated is also updated, which wastes time and resources. Therefore, after data inspection and before metadata updating, auditing operations can be added, which specifically includes:
step one, if the routing inspection results are different, generating an audit request instruction to obtain an audit result.
The audit instruction may be an instruction that requests to audit the difference of the metadata to determine whether to perform synchronous update, and the audit instruction may carry a metadata identifier and a difference that have a difference. The result of the review may be a result of whether to perform metadata update by performing the review on the difference of the metadata, for example: the audit result may include audit pass and audit fail.
Specifically, if the routing inspection result is different, an audit instruction is generated based on the metadata with the difference, and the audit instruction is sent to the terminal equipment of the auditor to obtain the audit result.
It should be noted that, the audit instruction may indicate the location information of the metadata with differences, and an auditor invokes the metadata according to the location information to perform manual audit to determine whether synchronous update is required. The auditing instructions may also directly carry metadata that has differences, such as: and the name and the nameCn, an auditor can directly judge whether synchronous updating is needed according to the information carried in the audit command.
And step two, if the audit result is that the audit is not passed, keeping the external metadata and the residual metadata unchanged.
Wherein, the auditing does not pass the auditing result which indicates that the difference is ignored, that is, the metadata with the difference does not need to be updated.
Specifically, if the audit result is that the audit is not passed, it indicates that the difference of the metadata of the data polling is not necessary to be updated, and can be ignored, so that the external metadata and the remaining metadata can be kept unchanged for continuous use.
It should be noted that each piece of metadata with a difference may be respectively audited, and then, the audit results of different pieces of metadata with a difference may be different, and subsequent operations may be respectively performed according to different audit results.
And step three, if the verification result is that the verification is passed, updating the external metadata and executing the operation of updating the metadata version number of the external metadata and the directory version number of each directory item based on the inspection result and/or the operation of updating the residual metadata and the metadata version number of the residual metadata based on the inspection result.
The audit pass indicates a synchronization difference, that is, an audit result of data synchronization update of metadata having a difference is required.
Specifically, if the audit result is that the audit is passed, the metadata corresponding to the audit result needs to be updated synchronously. And if the synchronously updated metadata is external metadata, after the external metadata is updated, executing the operation of updating the metadata version number of the external metadata and the directory version number of each directory item based on the inspection result. And if the synchronously updated metadata is the residual metadata, after updating the residual metadata, executing the operation of updating the residual metadata and the metadata version number of the residual metadata based on the polling result.
Illustratively, the auditing mode can be divided into auditing of a library, a table and a field, and can be realized as follows: a function of specified ignore, specified sync, and one-key bulk ignore and sync, and version publish-publishing all events at this level. The management and control of the examination and approval process can be carried out on the inspection results with differences, such as: one-touch override operation requires approval by an approver to be effective.
On the basis of the above embodiments, in order to select different data inspection reference points according to business requirements for data inspection, a first inspection instruction and/or a second inspection instruction may be generated in the following manner:
and if the current moment reaches a preset inspection period, generating a first inspection instruction and/or a second inspection instruction.
Wherein, the preset polling cycle can be a preset cycle for data polling, for example: and the preset polling period can be manually set and modified according to the data polling requirement after 1 day and the like, and is not specifically limited in the embodiment.
Specifically, if the current time reaches a preset data polling period, it indicates that data polling needs to be performed on the stored metadata, and at this time, a first polling instruction and/or a second polling instruction may be generated to perform data polling on the external metadata and/or the remaining metadata.
It should be noted that the preset inspection period may include a first inspection period and a second inspection period, and if the current time reaches the first inspection period, a first inspection instruction is generated; and if the current moment reaches a second inspection period, generating a second inspection instruction.
And if the polling trigger action aiming at the at least one directory entry is detected, generating a second polling instruction according to the at least one directory entry.
Wherein, the polling trigger behavior may be an operation behavior of performing data polling on external metadata corresponding to the directory entry, for example: the polling trigger behavior may be a behavior of clicking a polling control of the directory entry, executing a code for polling the directory entry, or the like.
Specifically, when the polling trigger action aiming at least one directory entry is detected, the at least one directory entry which is drunk by the polling trigger action is determined, and then a second polling instruction is generated according to the at least one directory entry so as to carry out data polling on external metadata corresponding to the directory entries.
As an optional implementation of the foregoing embodiments, fig. 3 is a schematic structural diagram of a metadata-driven data inspection and version management system according to a second embodiment of the present invention, where explanations of terms that are the same as or correspond to those in the foregoing embodiments are not repeated herein.
As shown in fig. 3, the metadata-driven data patrol and version management system includes a data collection adaptation module and a storage directory module.
The technical scheme of the embodiment of the invention is executed based on an execution main body, namely the data polling and version management system driven by metadata. Based on a data acquisition adaptation module in the metadata-driven data routing inspection and version management system, a communication connection between the system and each data source (target data source) can be established to acquire data from each data source.
The data acquisition adaptation module can complete acquisition of various types of database information, specific table and field contents in a data source through basic information such as IP, account numbers, passwords and the like, realize correlation of corresponding acquisition libraries and application, and provide data bases and blood margin analysis positioning for a later-stage storage directory.
Specifically, the metadata-driven data routing and version management system loads data from the designated external data source into the acquisition adaptation module. In the first data loading process, the whole data inspection can be simultaneously carried out, and the inspection items (preset inspection items) include but are not limited to: data type, data type length, data element definition, meta model, etc., and simultaneously records an initial version of this data (initial metadata version number). When the data is firstly included in the data acquisition adaptation module, an initial version, such as the version number of v1.0, is allocated to the data, and then the initial version is updated on the basis of the subsequent manual modification or routing inspection automatic modification, and meanwhile, the modification history of the data version is recorded.
And after the data is loaded through the acquisition adaptation module, establishing reference to the acquired data through the storage directory module to obtain a storage directory.
The corresponding data source in the storage directory is the acquisition adaptation module, but not necessarily all the data acquired in the acquisition adaptation module. The storage directory corresponds to all data used by each platform (external platform), such as a library, a table, field information, blood relationship information, a meta-model and the like.
It should be noted that the version update does not need to be manually set, and there is an operation approval (an audit instruction) when sensitive operations such as editing and deleting are performed, and after the approval is passed, the version is automatically recorded and difference records before and after modification (external metadata update records and/or remaining metadata update records) are generated.
In the data polling process, data polling can be carried out in the following two ways:
mode 1: and if data inspection is carried out on the metadata (residual metadata) loaded into the data acquisition adaptation module but not referenced by the storage catalog, the data is inspected by inspection by taking the data in the data source as a reference point.
Mode 2: and if data inspection (external metadata) is carried out on the metadata which is loaded into the data acquisition adaptation module, is referred by the storage directory and is associated with the related application, the data in the storage directory is taken as a reference point, and the data source is inspected by inspection.
For the routing inspection result, only inconsistent routing inspection results (with differences) are recorded, and version recording modes of the two modes are consistent: and recording the difference before and after the data updating.
Reference may be made to the following examples regarding the process of operational approval. Illustratively, if the data tour is for metadata: and after data inspection, the name in the field name of the table of the library A and the library B in the data source is changed into nameCn. At this point, the reviewer has two options: 1. ignore discrepancies (audit not passed), 2, synchronize discrepancies (audit passed). If the difference is selected to be ignored, any difference is not concerned, and the metadata is kept unchanged; if the synchronization difference is selected, the metadata in the data source is synchronized into the system, namely, the name is changed into the name Cn, two version records of the name and the name Cn are created in the synchronization process, and because a unique identifier is distributed to each data in the system, all the version records of the data can be inquired through the unique identifier.
Optionally, the processing operation performed on the inspection result may be examined, and a specific flow is shown in fig. 4, specifically, the processing operation on the different inspection results includes two types, namely an ignoring operation and a synchronizing operation. Submitting the inspection result and the processing operation selected aiming at the inspection result to be checked, and if the check is passed, executing neglect operation or synchronous operation by the system according to the selected processing operation; if the audit is not passed, the audit process is ended, and the routing inspection result selection processing operation with the difference can be returned to be executed.
Optionally, a timing task may be set in the system to perform full-database data scanning at a fixed time, and a timing period may be preset to be daily, and meanwhile, manual modification of the timing period is supported.
Through the metadata-driven data inspection and version management system, data differences of different reference points can be visually seen from a page, so that the problems of discussion, solution, data difference and version determination from different reference point perspectives are facilitated. And, provide regularly to patrol and examine and manually patrol and examine two kinds of modes when data are patrolled and examined, regularly patrol and examine the data bulk big and patrol and examine comprehensively, manually patrol and examine the scope of patrolling and examining little and have pertinence, patrol and examine to different demands, there is very big facility to the management of data.
According to the technical scheme of the embodiment of the invention, communication connection with each target data source is respectively established based on at least one target data source, an initial metadata set is established according to each target data source, a metadata version number is established for each metadata in the initial metadata set, so that metadata in each target data source is acquired and obtained, and the metadata version number is distributed. And then, determining external metadata corresponding to the external platform from the initial metadata set, dividing each external metadata according to a data domain corresponding to each external metadata, determining each directory item of the storage directory according to a hierarchical structure of the data domain, establishing a corresponding relation between each external metadata and each directory item according to a dividing result of each external metadata, establishing a directory version number for each directory item in the storage directory, and establishing the storage directory and distributing the directory version number. When a first inspection instruction is received, inspecting external metadata associated with each directory entry in the storage directory based on the first inspection instruction and a target data source, and updating the external metadata, the metadata version number of the external metadata and the directory version number of the directory entry corresponding to the updated external metadata based on an inspection result; when a second inspection instruction is received, based on the second inspection instruction and a target data source, inspection is conducted on the residual metadata in the initial metadata set, the residual metadata and the metadata version number of the residual metadata are updated based on an inspection result, the problems that data inspection is large in data size, long in time consumption and difficult to determine data difference before and after data inspection are solved, the data inspection speed and pertinence are improved, and the technical effect of recording the difference in data inspection is achieved.
EXAMPLE III
Fig. 5 is a schematic structural diagram of a metadata-driven data inspection and version management apparatus according to a third embodiment of the present invention, where the apparatus includes: an initial metadata set creation module 310, a storage catalog creation module 320, a first tour module 330, and a second tour module 340.
The initial metadata set establishing module 310 is configured to respectively establish communication connections with target data sources based on at least one target data source, establish an initial metadata set according to the target data sources, and establish a metadata version number for each metadata in the initial metadata set; wherein the initial metadata set comprises external metadata and residual metadata; the storage directory establishing module 320 is configured to establish a storage directory according to the external metadata in the initial metadata set, and establish a directory version number for each directory entry in the storage directory; the first inspection module 330 is configured to, when a first inspection instruction is received, inspect external metadata associated with each directory entry in the storage directory based on the first inspection instruction and the target data source, and update the external metadata, a metadata version number of the external metadata, and a directory version number of a directory entry corresponding to the updated external metadata based on an inspection result; and the second inspection module 340 is configured to, when a second inspection instruction is received, inspect the remaining metadata in the initial metadata set based on the second inspection instruction and the target data source, and update the remaining metadata and a metadata version number of the remaining metadata based on an inspection result.
Optionally, the initial metadata set creating module 310 is further configured to determine at least one target data source according to data source information in the data source connection information; and establishing communication connection with the at least one target data source according to configuration information in the data source connection information.
Optionally, the initial metadata set establishing module 310 is further configured to collect metadata in each target data source, perform data inspection on the metadata according to a preset inspection item and each target data source, and establish an initial metadata set according to the metadata after data inspection; the preset routing inspection item comprises at least one of a data type, a data length, a data element definition and a metadata model; establishing an initial metadata version number corresponding to the metadata for each metadata in the initial set of metadata.
Optionally, the storage directory establishing module 320 is further configured to determine external metadata corresponding to the external platform from the initial metadata set; dividing each external metadata according to a data field corresponding to each external metadata; determining each directory entry of the storage directory according to the hierarchical structure of the data domain, and establishing a corresponding relation between each external metadata and each directory entry according to the division result of each external metadata.
Optionally, the first inspection module 330 is further configured to analyze the first inspection instruction, and determine an inspection directory entry corresponding to the first inspection instruction; determining a data source to be inspected from the target data source according to the data blood margin of the external metadata corresponding to the inspection directory entry; according to the data source to be patrolled, carrying out data patrolling on external metadata corresponding to the patrolling directory entry; and if the external metadata corresponding to the polling directory entry is updated according to the polling result, updating the metadata version number of the external metadata to be updated, generating an external metadata updating record according to the external metadata before updating and the external metadata after updating, and updating the directory version number of the polling directory entry corresponding to the external metadata to be updated.
Optionally, the second inspection module 340 is further configured to determine a data source to be inspected from the target data source according to the remaining metadata in the initial metadata set; performing data inspection on the data source to be inspected and the residual metadata; and if the residual metadata are updated according to the polling result, updating the metadata version number of the updated residual metadata, and generating a residual metadata updating record according to the residual metadata before updating and the residual metadata after updating.
Optionally, the apparatus further comprises: the auditing module is used for generating an auditing request instruction to obtain an auditing result if the routing inspection results are different; if the audit result is that the audit is not passed, keeping the external metadata and the residual metadata unchanged; and if the verification result is that the verification is passed, updating the external metadata and executing the operation of updating the metadata version number of the external metadata and the directory version number of each directory item based on the inspection result and/or the operation of updating the residual metadata and the metadata version number of the residual metadata based on the inspection result.
Optionally, the apparatus further comprises: the inspection instruction generating module is used for generating the first inspection instruction and/or the second inspection instruction if the current moment reaches a preset inspection period; or if the polling trigger action aiming at least one directory entry is detected, generating the second polling instruction according to the at least one directory entry.
According to the technical scheme of the embodiment of the invention, communication connection with each target data source is respectively established based on at least one target data source, an initial metadata set is established according to each target data source, a metadata version number is established for each metadata in the initial metadata set, so that metadata in each target data source is acquired and obtained, and the metadata version number is distributed. And then, establishing a storage directory according to the external metadata in the initial metadata set, and establishing directory version numbers for all directory entries in the storage directory so as to establish the storage directory and distribute the directory version numbers. When a first inspection instruction is received, inspecting external metadata associated with each directory entry in the storage directory based on the first inspection instruction and a target data source, and updating the external metadata, the metadata version number of the external metadata and the directory version number of the directory entry corresponding to the updated external metadata based on an inspection result; when a second inspection instruction is received, based on the second inspection instruction and a target data source, inspection is conducted on the residual metadata in the initial metadata set, the residual metadata and the metadata version number of the residual metadata are updated based on an inspection result, the problems that data inspection is large in data size, long in time consumption and difficult to determine data difference before and after data inspection are solved, the data inspection speed and pertinence are improved, and the technical effect of recording the difference in data inspection is achieved.
The metadata-driven data polling and version management device provided by the embodiment of the invention can execute the metadata-driven data polling and version management method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.
It should be noted that, the units and modules included in the apparatus are merely divided according to functional logic, but are not limited to the above division as long as the corresponding functions can be implemented; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the embodiment of the invention.
Example four
Fig. 6 is a schematic structural diagram of an electronic device according to a fourth embodiment of the present invention. FIG. 6 illustrates a block diagram of an exemplary electronic device 40 suitable for use in implementing embodiments of the present invention. The electronic device 40 shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiment of the present invention.
As shown in fig. 6, electronic device 40 is embodied in the form of a general purpose computing device. The components of electronic device 40 may include, but are not limited to: one or more processors or processing units 401, a system memory 402, and a bus 403 that couples the various system components (including the system memory 402 and the processing unit 401).
Bus 403 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, such architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, micro-channel architecture (MAC) bus, enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.
Electronic device 40 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by electronic device 40 and includes both volatile and nonvolatile media, removable and non-removable media.
The system memory 402 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM)404 and/or cache 405. The electronic device 40 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 406 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 6, commonly referred to as a "hard drive"). Although not shown in FIG. 6, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In these cases, each drive may be connected to the bus 403 by one or more data media interfaces. System memory 402 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the invention.
A program/utility 408 having a set (at least one) of program modules 407 may be stored, for example, in system memory 402, such program modules 407 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which examples or some combination thereof may comprise an implementation of a network environment. Program modules 407 generally perform the functions and/or methods of the described embodiments of the invention.
The electronic device 40 may also communicate with one or more external devices 409 (e.g., keyboard, pointing device, display 410, etc.), with one or more devices that enable a user to interact with the electronic device 40, and/or with any devices (e.g., network card, modem, etc.) that enable the electronic device 40 to communicate with one or more other computing devices. Such communication may be through input/output (I/O) interface 411. Also, the electronic device 40 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) via the network adapter 412. As shown, the network adapter 412 communicates with the other modules of the electronic device 40 over the bus 403. It should be appreciated that although not shown in FIG. 6, other hardware and/or software modules may be used in conjunction with electronic device 40, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
The processing unit 401 executes various functional applications and data processing, such as implementing metadata-driven data routing and version management methods provided by embodiments of the present invention, by running a program stored in the system memory 402.
EXAMPLE five
An embodiment of the present invention further provides a storage medium containing computer-executable instructions, which when executed by a computer processor, are configured to perform a metadata-driven data inspection and version management method, the method including:
respectively establishing communication connection with each target data source based on at least one target data source, establishing an initial metadata set according to each target data source, and establishing a metadata version number for each metadata in the initial metadata set; wherein the initial metadata set comprises external metadata and residual metadata;
establishing a storage directory according to external metadata in the initial metadata set, and establishing directory version numbers for directory entries in the storage directory;
when a first inspection instruction is received, inspecting external metadata associated with each directory entry in the storage directory based on the first inspection instruction and the target data source, and updating the external metadata, the metadata version number of the external metadata and the directory version number of the directory entry corresponding to the updated external metadata based on inspection results;
when a second polling instruction is received, polling is conducted on the residual metadata in the initial metadata set based on the second polling instruction and the target data source, and the residual metadata and the metadata version number of the residual metadata are updated based on a polling result.
Computer storage media for embodiments of the invention may employ any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for embodiments of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims (10)

1. A metadata driven data polling and version management method is characterized by comprising the following steps:
respectively establishing communication connection with each target data source based on at least one target data source, establishing an initial metadata set according to each target data source, and establishing a metadata version number for each metadata in the initial metadata set; wherein the initial metadata set comprises external metadata and residual metadata;
establishing a storage directory according to external metadata in the initial metadata set, and establishing directory version numbers for directory entries in the storage directory;
when a first inspection instruction is received, inspecting external metadata associated with each directory entry in the storage directory based on the first inspection instruction and the target data source, and updating the external metadata, the metadata version number of the external metadata and the directory version number of the directory entry corresponding to the updated external metadata based on inspection results;
when a second polling instruction is received, polling is conducted on the residual metadata in the initial metadata set based on the second polling instruction and the target data source, and the residual metadata and the metadata version number of the residual metadata are updated based on a polling result.
2. The method of claim 1, wherein the establishing communication connection with each target data source based on at least one target data source respectively comprises:
determining at least one target data source according to data source information in the data source connection information;
and establishing communication connection with the at least one target data source according to configuration information in the data source connection information.
3. The method of claim 1, wherein establishing an initial set of metadata from each of the target data sources, establishing a metadata version number for each metadata in the initial set of metadata, comprises:
collecting metadata in each target data source, performing data inspection on the metadata according to preset inspection items and each target data source, and establishing an initial metadata set according to the metadata after data inspection; the preset routing inspection item comprises at least one of a data type, a data length, a data element definition and a metadata model;
establishing an initial metadata version number corresponding to the metadata for each metadata in the initial set of metadata.
4. The method of claim 1, wherein the creating a storage directory according to the external metadata in the initial metadata set comprises:
determining circumscribed metadata corresponding to a circumscribed platform from the initial metadata set;
dividing each external metadata according to a data field corresponding to each external metadata;
determining each directory entry of the storage directory according to the hierarchical structure of the data domain, and establishing a corresponding relation between each external metadata and each directory entry according to the division result of each external metadata.
5. The method of claim 1, wherein polling the external metadata associated with each directory entry in the storage directory based on the first polling instruction and the target data source, and updating the external metadata, a metadata version number of the external metadata, and a directory version number of a directory entry corresponding to the updated external metadata based on polling results comprises:
analyzing the first inspection instruction, and determining an inspection directory entry corresponding to the first inspection instruction;
determining a data source to be inspected from the target data source according to the data blood margin of the external metadata corresponding to the inspection directory entry;
according to the data source to be patrolled, carrying out data patrolling on external metadata corresponding to the patrolling directory entry;
and if the external metadata corresponding to the polling directory entry is updated according to the polling result, updating the metadata version number of the external metadata to be updated, generating an external metadata updating record according to the external metadata before updating and the external metadata after updating, and updating the directory version number of the polling directory entry corresponding to the external metadata to be updated.
6. The method of claim 1, wherein polling the remaining metadata in the initial set of metadata based on the second polling instructions and the target data source, updating the remaining metadata and a metadata version number of the remaining metadata based on polling results, comprises:
determining a data source to be inspected from the target data source according to the residual metadata in the initial metadata set;
performing data inspection on the data source to be inspected and the residual metadata;
and if the residual metadata are updated according to the polling result, updating the metadata version number of the updated residual metadata, and generating a residual metadata updating record according to the residual metadata before updating and the residual metadata after updating.
7. The method of claim 1, further comprising:
if the routing inspection result is different, generating an audit request instruction to obtain an audit result;
if the audit result is that the audit is not passed, keeping the external metadata and the residual metadata unchanged;
and if the verification result is that the verification is passed, updating the external metadata and executing the operation of updating the metadata version number of the external metadata and the directory version number of each directory item based on the inspection result and/or the operation of updating the residual metadata and the metadata version number of the residual metadata based on the inspection result.
8. The method of claim 1, further comprising:
if the current moment reaches a preset inspection period, generating the first inspection instruction and/or the second inspection instruction; or the like, or, alternatively,
and if the polling trigger action aiming at least one directory entry is detected, generating the second polling instruction according to the at least one directory entry.
9. A metadata driven data tour inspection and version management apparatus, comprising:
the system comprises an initial metadata set establishing module, a metadata version number establishing module and a metadata version number establishing module, wherein the initial metadata set establishing module is used for respectively establishing communication connection with each target data source based on at least one target data source, establishing an initial metadata set according to each target data source and establishing a metadata version number for each metadata in the initial metadata set; wherein the initial metadata set comprises external metadata and residual metadata;
the storage directory establishing module is used for establishing a storage directory according to the external metadata in the initial metadata set and establishing directory version numbers for directory items in the storage directory;
the first inspection module is used for inspecting external metadata associated with each directory entry in the storage directory based on the first inspection instruction and the target data source when a first inspection instruction is received, and updating the external metadata, the metadata version number of the external metadata and the directory version number of the directory entry corresponding to the updated external metadata based on an inspection result;
and the second inspection module is used for inspecting the residual metadata in the initial metadata set based on the second inspection instruction and the target data source when receiving a second inspection instruction, and updating the residual metadata and the metadata version number of the residual metadata based on an inspection result.
10. An electronic device, characterized in that the electronic device comprises:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement the metadata driven data routing and version management method of any one of claims 1-8.
CN202111617949.2A 2021-12-27 2021-12-27 Metadata-driven data polling and version management method and device and electronic equipment Pending CN114218301A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111617949.2A CN114218301A (en) 2021-12-27 2021-12-27 Metadata-driven data polling and version management method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111617949.2A CN114218301A (en) 2021-12-27 2021-12-27 Metadata-driven data polling and version management method and device and electronic equipment

Publications (1)

Publication Number Publication Date
CN114218301A true CN114218301A (en) 2022-03-22

Family

ID=80706347

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111617949.2A Pending CN114218301A (en) 2021-12-27 2021-12-27 Metadata-driven data polling and version management method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN114218301A (en)

Similar Documents

Publication Publication Date Title
CN108959564B (en) Data warehouse metadata management method, readable storage medium and computer device
US9342570B2 (en) Detecting reference data tables in extract-transform-load processes
US7822741B2 (en) API for programmatic retrieval and replay of database trace
Rupprecht et al. Improving reproducibility of data science pipelines through transparent provenance capture
US9146994B2 (en) Pivot facets for text mining and search
US20070234306A1 (en) Tracing performance of machine-readable instructions
CN111125068A (en) Metadata management method and system
CN114880405A (en) Data lake-based data processing method and system
CN113886485A (en) Data processing method, device, electronic equipment, system and storage medium
CN116662441A (en) Distributed data blood margin construction and display method
CN111414410A (en) Data processing method, device, equipment and storage medium
CN116383193A (en) Data management method and device, electronic equipment and storage medium
CN116226166A (en) Data query method and system based on data source
CN113010208B (en) Version information generation method, device, equipment and storage medium
CN110889013A (en) Data association method, device, server and storage medium based on XML
EP3062245B1 (en) Dynamic modular ontology
CN108959454B (en) Prompting clause specifying method, device, equipment and storage medium
CN113962597A (en) Data analysis method and device, electronic equipment and storage medium
CN114185791A (en) Method, device and equipment for testing data mapping file and storage medium
CN116823464B (en) Data asset management platform, electronic device, and computer-readable storage medium
US20210264312A1 (en) Facilitating machine learning using remote data
CN113450928A (en) Drug test data control method and system
CN107004036B (en) Method and system for searching logs containing a large number of entries
CN115952160B (en) Data checking method
CN109524074B (en) Case discussion method and device, computer-readable storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination