CN115481105A - Data management method, device, electronic equipment and storage medium - Google Patents

Data management method, device, electronic equipment and storage medium Download PDF

Info

Publication number
CN115481105A
CN115481105A CN202211110462.XA CN202211110462A CN115481105A CN 115481105 A CN115481105 A CN 115481105A CN 202211110462 A CN202211110462 A CN 202211110462A CN 115481105 A CN115481105 A CN 115481105A
Authority
CN
China
Prior art keywords
data
information
target
original
original data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211110462.XA
Other languages
Chinese (zh)
Inventor
郭永东
万月亮
程强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Ruian Technology Co Ltd
Original Assignee
Beijing Ruian Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Ruian Technology Co Ltd filed Critical Beijing Ruian Technology Co Ltd
Priority to CN202211110462.XA priority Critical patent/CN115481105A/en
Publication of CN115481105A publication Critical patent/CN115481105A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06395Quality analysis or management

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Educational Administration (AREA)
  • Strategic Management (AREA)
  • Development Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Economics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Game Theory and Decision Science (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a data management method, which comprises the following steps: acquiring original data from each data source, recording source information of the original data, and evaluating the quality of the original data to obtain quality information of the original data; processing and fusing the original data to obtain target data, and recording the data fusion process; distributing target data to each target end, recording destination information of the target data, and evaluating the quality of the target data to obtain quality information of the target data; constructing a data blood relationship according to the source information of the original data, the data fusion process and the destination information of the target data; and displaying the original data, the quality information of the original data, the data fusion process, the target data and the quality information of the target data according to the data blood relationship. According to the scheme, the data blooding margin relation is constructed according to the source and the destination of the data, the data and the data quality are displayed according to the constructed data blooding margin relation, the data tracing and value evaluation can be conveniently carried out, and the data treatment effect is improved.

Description

Data management method, device, electronic equipment and storage medium
Technical Field
The present invention relates to data processing technologies, and in particular, to a data management method and apparatus, an electronic device, and a storage medium.
Background
In the big data era, the sources of data are more and more extensive, the data are exponentially increased in content and volume, and meanwhile, the data management is more and more difficult. In the process of implementing the invention, the problems of difficult data source tracing and difficult data value evaluation exist in the existing data management.
Disclosure of Invention
The embodiment of the invention provides a data management method, a data management device, electronic equipment and a storage medium, which can be used for conveniently tracing data and evaluating value and improving the data management effect.
In a first aspect, an embodiment of the present invention provides a data management method, including:
acquiring original data from each data source, recording source information of the original data, and evaluating the quality of the original data to obtain quality information of the original data;
processing and fusing the original data to obtain target data, and recording a data fusion process;
distributing the target data to each target end, recording the destination information of the target data, and evaluating the quality of the target data to obtain the quality information of the target data;
constructing a data blood relationship according to the source information of the original data, the data fusion process and the destination information of the target data;
and displaying the original data, the quality information of the original data, the data fusion process, the target data and the quality information of the target data according to the data blood relationship.
In a second aspect, an embodiment of the present invention provides a data management apparatus, including:
the acquisition module is used for acquiring original data from each data source, recording source information of the original data, evaluating the quality of the original data and obtaining the quality information of the original data;
the processing module is used for processing and fusing the original data to obtain target data and recording a data fusion process;
the distribution module is used for distributing the target data to each target end, recording the destination information of the target data, evaluating the quality of the target data and obtaining the quality information of the target data;
the construction module is used for constructing a data blood relationship according to the source information of the original data, the data fusion process and the destination information of the target data;
and the display module is used for displaying the original data, the quality information of the original data, the data fusion process, the target data and the quality information of the target data according to the data blood relationship.
In a third aspect, an embodiment of the present invention further provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the computer program to implement the data management method according to any one of the embodiments of the present invention.
In a fourth aspect, embodiments of the present invention further provide a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the data management method according to any one of the embodiments of the present invention.
In the embodiment of the invention, the original data can be obtained from each data source, the source information of the original data is recorded, and the quality of the original data is evaluated to obtain the quality information of the original data; processing and fusing the original data to obtain target data, and recording the data fusion process; distributing target data to each target end, recording destination information of the target data, and evaluating the quality of the target data to obtain quality information of the target data; constructing a data blood relationship according to the source information of the original data, the data fusion process and the destination information of the target data; and displaying the original data, the quality information of the original data, the data fusion process, the target data and the quality information of the target data according to the data blood relationship. The invention can construct the data consanguinity relationship according to the source and the destination of the data, display the data and the data quality according to the constructed data consanguinity relationship, conveniently trace the source and evaluate the value of the data and improve the data treatment effect.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.
FIG. 1 is a flow chart of a data management method according to an embodiment of the present invention;
FIG. 2 is another schematic flow chart of a data management method according to an embodiment of the present invention;
FIG. 3 is a block diagram of a data management apparatus according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some structures related to the present invention are shown in the drawings, not all of them.
Fig. 1 is a schematic flowchart of a data management method provided in an embodiment of the present invention, where the method may be executed by a data management apparatus provided in an embodiment of the present invention, and the apparatus may be implemented in software and/or hardware. In a particular embodiment, the apparatus may be integrated in an electronic device, which may be, for example, a computer. The following embodiments will be described by taking as an example that the apparatus is integrated in an electronic device, and referring to fig. 1, the method may specifically include the following steps:
step 101, acquiring original data from each data source, recording source information of the original data, and evaluating the quality of the original data to obtain quality information of the original data.
The data source refers to a source of original data, and different data sources can be available for different services. For example, for an enterprise data collection business, the data sources may include various departments of the enterprise, such as a corporate funding department, sales department, production department, and so forth; for a data collection service for a province, the data sources may include cities, districts, counties, etc. within the jurisdiction of the province. The original data may be stored in a library table at the corresponding data source, and thus, the source information of the recorded original data may include the database, table, field information, and the like of the original data source.
When the quality of the original data is evaluated, a plurality of quality information evaluation indexes are comprehensively considered to give a score quantization standard, then the corresponding original data is checked according to the evaluation indexes, the check result is quantized according to the score quantization standard to obtain a specific score, and further the quality information of the original data is obtained. For example, if a certain item of data meets or matches the corresponding evaluation index, the quality score of the item of data is recorded as 1, otherwise, the quality score of the item of data is recorded as 0.
When the source information of the recorded raw data includes the database, the table and the field information of the source of the raw data, the quality evaluation of the raw data can be performed from two dimensions of the table and the field, wherein an evaluation index for performing the quality evaluation from the table dimension and an evaluation index for performing the quality evaluation from the field dimension can be different. By way of example, the evaluation metrics for quality evaluation from the field dimension may include correctness, completeness, validity, normalization, etc. of the data, and the evaluation metrics for quality evaluation from the table dimension may include uniqueness, immediacy, etc. of the data. Performing quality evaluation on the original data from the table dimension to obtain table-level quality information of the original data, wherein the table-level quality information can include quality evaluation information of all tables of an original data source; the quality evaluation of the raw data from the field dimension may result in field-level quality information of the raw data, which may include quality evaluation information for all fields from which the raw data originated. One field can correspond to one or more evaluation indexes, and when one field corresponds to a plurality of evaluation indexes, the final quality evaluation information of the field can be determined by combining the plurality of evaluation indexes; for example, the evaluation scores of the evaluation indexes of the field may be added or averaged to obtain the final quality evaluation information of the field; one table can also correspond to one or more evaluation indexes, and when one table corresponds to a plurality of evaluation indexes, the final quality evaluation information of the table can be determined by combining the plurality of evaluation indexes; for example, the evaluation scores of the multiple evaluation indexes of the table may be added or averaged to obtain the final quality evaluation information of the table.
Specifically, in the process of evaluating the quality information of the data, the correctness of the data can be reflected in whether the data format of the specified item is correct, for example, characters or letters should be filled in the name, no numbers or special characters should appear, and if yes, the correctness score is reduced; the integrity of the data can be reflected in the condition that whether the data with fixed number of bits leaks number, such as 15 bits or 18 bits of the identity card number, 11 bits of the mobile phone number and the like, and the integrity score can be reduced if the number of the identity card number or the mobile phone number is insufficient; the validity of the data can be embodied in the condition that the data of the specified item can not go against the principle common knowledge, such as the age data of a person can not go 300, 500 and the like, otherwise, the validity score is reduced; normalization of data may be embodied in that the specified item of data should be provided in a desired format, such as age data should be provided in an arabic number format, and if provided in other formats, normalization scores may be reduced; the uniqueness of the data can be embodied in that the data of the appointed item is unique and non-repeated in the table, for example, the identification number of each person is unique, and if the identification number is repeated, the uniqueness score is reduced; the immediacy of the data can be embodied in that the data of a given item needs to be updated instantly, for example, if the number of people participating in an activity should be provided and updated every day, the instantaneity score of the data will be reduced if the data is not updated for many days.
And 102, processing and fusing the original data to obtain target data, and recording a data fusion process.
Illustratively, the processing and fusing of the data may include cleansing, extracting, transforming, etc. the data. Specifically, data cleaning refers to finding and correcting recognizable errors in a data file, including checking data consistency, processing invalid values and missing values, and the like; data extraction refers to extracting required information from an original document according to a certain purpose for further storage, conversion and analysis; data conversion is the process of changing data from one representation to another, for example, the age of a person should be provided in arabic numeral format, but in the original data, if the age item is represented by upper case, the upper case is required to be converted into corresponding arabic numeral, for example, converting "twenty" into "20".
And 103, distributing the target data to each target end, recording the destination information of the target data, and evaluating the quality of the target data to obtain the quality information of the target data.
For example, the target end may include one or more target databases for receiving the target data, and the destination information of the target data may include a database, a table, a field, and the like for the destination of the target data; similarly, the quality of the evaluation target data may be obtained by comprehensively considering a plurality of quality information evaluation indexes to give a score quantization standard, checking the corresponding target data according to the evaluation indexes, and quantizing the check result according to the score quantization standard to obtain a specific score, thereby obtaining the quality information of the target data. The quality of the target data can also be evaluated from two dimensions, namely a table and a field, and the evaluation indexes of the corresponding dimensions can be the same as those adopted when the original data is evaluated, and are not described again here.
And 104, constructing a data blood relationship according to the source information of the original data, the data fusion process and the destination information of the target data.
Specifically, the relationship of the blood relationship of the data is a relationship similar to the relationship of the blood relationship of human society formed between the data in the processes of generation, processing, circulation to extinction. In specific implementation, the original data and the source information thereof, the data fusion process, the target data obtained after processing and fusion and the destination information thereof can be recorded in a one-to-one manner to obtain the data blood relationship.
And 105, displaying the original data, the quality information of the original data, the data fusion process, the target data and the quality information of the target data according to the data blood relationship.
Specifically, the data blood relationship is displayed, so that the user can clearly observe the original data, the quality information of the original data, the data fusion process, the target data and the quality information of the target data in each data blood relationship. By way of example, in one possible implementation, the presentation of data kindred relationships may be implemented by a graph, a table; for example, data items (such as a current storage location, a source location, a destination location, and the like) for representing the data blood relationship may be created in the table, and the original data, the quality information of the target data, and the quality information of the target data are correspondingly stored in the data items for representing the data blood relationship in the table; or creating a directed graph, and displaying the original data, the quality information of the original data, the data fusion process, the target data and the quality information of the target data in the directed graph.
In the embodiment of the invention, the original data can be obtained from each data source, the source information of the original data is recorded, and the quality of the original data is evaluated to obtain the quality information of the original data; processing and fusing the original data to obtain target data, and recording the data fusion process; distributing target data to each target end, recording destination information of the target data, and evaluating the quality of the target data to obtain quality information of the target data; constructing a data blood relationship according to the source information of the original data, the data fusion process and the destination information of the target data; and displaying the original data, the quality information of the original data, the data fusion process, the target data and the quality information of the target data according to the data blood relationship. According to the scheme provided by the invention, the data blood relationship is constructed according to the source and the destination of the data, the data and the data quality are displayed according to the constructed data blood relationship, the data tracing and the value evaluation can be conveniently carried out, and the data treatment effect is improved.
The following further describes the data management method provided in the embodiment of the present invention, and as shown in fig. 2, the method may include the following steps:
step 201, obtaining original data from each data source, recording table information and field information of the original data source, performing table-level quality evaluation on the original data according to a first preset index to obtain table-level quality information of the original data, and performing field-level quality evaluation on the original data according to a second preset index to obtain field-level quality information of the original data.
Where table information may be understood as an object in a database used to store data, field information indicates variables associated with the object or class. Specifically, the first preset index refers to an evaluation index which is set by a user in advance according to requirements and performs quality evaluation from a table dimension, and may include uniqueness, instantaneity and the like of data. The second preset index refers to an evaluation index which is set by a user in advance according to requirements and performs quality evaluation from a field dimension, and may include, for example, correctness, completeness, validity, normalization, and the like of data. For example, the specific field-level quality information in a table may be as shown in table 1 below:
Figure BDA0003842887920000081
TABLE 1
As shown in table 1, it can be understood as quality information of each field data (field A1, field A2, etc.) in table a.
For example, a specific table-level quality information may be as shown in table 2 below:
data sheet A Preset index K1 K1 score Preset index K2 K2 score
Data sheet B Preset index K1 K1 score Preset index K2 K2 score
TABLE 2
Step 202, processing and fusing the original data to obtain target data, and recording the data fusion process.
Step 203, distributing the target data to each target end and recording the table information and the field information of the destination of the target data, performing table-level quality evaluation on the target data according to a first preset index to obtain the table-level quality information of the target data, and performing field-level quality evaluation on the target data according to a second preset index to obtain the field-level quality information of the target data.
In other words, in this embodiment, the evaluation indexes used for performing the table-level quality evaluation on the target data and the table-level quality evaluation on the original data are consistent; and performing field-level quality evaluation on the target data and performing field-level quality evaluation on the original data, wherein the adopted evaluation indexes are consistent.
And step 204, constructing a data blood relationship according to the source information of the original data, the data fusion process and the destination information of the target data.
Data consanguinity relationships may reflect the source and destination of current data.
And step 205, displaying the original data, the quality information of the original data, the data fusion process, the target data and the quality information of the target data according to the data blood relationship.
In a possible implementation, the presentation of the data blood relationship can be realized by a graph and a table. Illustratively, one specific example of a table showing data consanguinity relationships may be as shown in table 3 below:
Figure BDA0003842887920000091
Figure BDA0003842887920000101
TABLE 3
In the solution of this embodiment, it can be understood that an Extraction-Transformation-Loading (ETL) task is executed on an intermediate device, and in the process of executing the ETL task, a data processing context is recorded to form a data blood-related relationship to show data; thus, the data of Table 3 may show the source and destination information for a piece of data stored on the intermediary device; for example, the first row of table 3 shows data 1 stored in field 1 of data table 1 of the service database 1 of the intermediate device, which is sourced as the original data a in field 1 of data table a and goes to the destination data H1 in field 1 of data table H.
And step 206, determining first abnormal data from the original data according to the quality information of the original data, and determining second abnormal data from the target data according to the quality information of the target data.
In a specific implementation, the first abnormal data may be raw data with a score that does not meet a requirement after evaluating quality information of the raw data according to a first preset index to obtain a specific score, for example, raw data with a score lower than a preset score of a user; illustratively, the preset score may be 0.6, 0.5, etc.; the second abnormal data may be understood as target data whose score does not satisfy a requirement, for example, target data whose score is lower than a score preset by a user, after the quality information of the target data is evaluated according to a second preset index to obtain a specific score.
And step 207, displaying the first abnormal data and the second abnormal data according to a specified mode.
The designated mode may be understood as a mode in which the content is highlighted or displayed differently from normal data, the data in the original data except the first abnormal data is normal data, and the data in the target data except the second abnormal data is normal data. The designation may be to add highlighting to the data, use a special color for the font, or enlarge the font size, etc. In an example, the first abnormal data and the second abnormal data are displayed according to a specified mode, so that a user can clearly observe the original data, the quality information of the original data, the data fusion process, the quality information of the target data and the quality information of the target data in each section of blood relationship between the first abnormal data and the second abnormal data, and the user can conveniently find the abnormality.
And step 208, feeding back a first data quality evaluation result to the first terminal according to the source information of the first abnormal data.
The first terminal may be understood as a terminal that provides the first abnormal data source information, and may be, for example, a terminal used by a data reporter; the first data quality evaluation result may include quality information of the first abnormal data.
And step 209, feeding back a second data quality evaluation result to the second terminal according to the destination information of the second abnormal data.
The second terminal may be understood as a terminal that receives the second abnormal data, and may be, for example, a terminal that is used by an information database manager; the second data quality evaluation result may include quality information of the second abnormal data.
Step 210, receiving a data query request sent by a target terminal, where the data query request includes data identification information of data to be queried.
In particular, data identification information may be understood as information used to differentially mark data.
And step 211, inquiring the data blood relationship based on the data identification information to obtain the data to be inquired.
Step 212, feeding back the data to be queried to the target terminal.
According to the scheme provided by the embodiment of the invention, the data blood relationship is constructed according to the source and the destination of the data, and the data quality are displayed according to the constructed data blood relationship, so that the data tracing and value evaluation can be conveniently carried out, and the data treatment effect is improved; in addition, abnormal data are identified according to the data quality and displayed in a designated mode, so that related personnel can find the abnormal data conveniently; the quality evaluation result of the abnormal data is fed back to the corresponding terminal, so that the subsequent improvement and improvement of the data acquisition quality are facilitated; furthermore, a data query function is provided, so that the managed data can be fully utilized, and the data value is improved.
The data management method of the present invention is illustrated below in a specific application scenario, which takes the example of collecting certain detection data of all people nationwide; specifically, the personnel detection data reported by each province, county and county can be collected, the personnel detection data can include personnel identification, detection time, detection items, detection results and other data, and the data are original data; performing table-level quality evaluation on the original data by using a first preset index to obtain table-level quality information of the original data (for example, the table-level quality information may include scores such as uniqueness and instantaneity of each table data), and performing field-level quality evaluation on the original data by using a second preset index to obtain field-level quality information of the original data (for example, the field-level quality information may include scores of correctness, completeness, validity or normalization of each item in personnel identification, detection time, detection items and detection results); after the original data are cleaned, extracted, converted and the like, corresponding target data can be obtained, the target data are evaluated according to the same evaluation mode as the original data, after evaluation, the table-level quality information and the field-level quality information of the target data are obtained, the target data are distributed to all data sources, and the destination information of the target data is obtained; and constructing a data blood relationship according to the source information and the destination information of the data, and displaying the detection data of the personnel and the quality information of the corresponding data by using the data blood relationship.
In addition, abnormal data (such as low-grade data) can be respectively identified from the original person detection data and the target person detection data, and the data is highlighted so as to find the abnormal data; for example, if the quality of data from a certain county is poor, a corresponding data quality evaluation result may be sent to a terminal of a manager in the county, so as to prompt the county to improve the quality of the acquired data; in addition, the administered data can also provide data query service, for example, if the relevant personnel need to check the data quality of a certain county or the detection result of the certain county, the data quality of the county or the detection result of the county can be fed back to the terminal of the relevant personnel, and the data use value is improved.
Fig. 3 is a structural diagram of a data management apparatus according to an embodiment of the present invention, which is adapted to execute the data management method according to the embodiment of the present invention. As shown in fig. 3, the apparatus may specifically include:
an obtaining module 301, configured to obtain raw data from each data source, record source information of the raw data, and evaluate quality of the raw data to obtain quality information of the raw data;
the processing module 302 is configured to process and fuse the original data to obtain target data, and record a data fusion process;
the distribution module 303 is configured to distribute the target data to each target end, record destination information of the target data, and evaluate the quality of the target data to obtain quality information of the target data;
the construction module 304 is used for constructing a data blood relationship according to the source information of the original data, the data fusion process and the destination information of the target data;
and a display module 305, configured to display the original data, the quality information of the original data, the data fusion process, the target data, and the quality information of the target data according to the data blood relationship.
In an embodiment, the obtaining module 301 records source information of the original data, which specifically includes:
recording table information and field information of the original data source;
the distributing module 303 records destination information of the target data, and specifically includes:
and recording table information and field information of the destination of the target data.
In an embodiment, the obtaining module 301 evaluates the quality of the raw data to obtain the quality information of the raw data, and specifically includes:
performing table-level quality evaluation on the original data to obtain table-level quality information of the original data, and performing field-level quality evaluation on the original data to obtain field-level quality information of the original data;
the distributing module 303 evaluates the quality of the target data to obtain quality information of the target data, and specifically includes:
and performing table-level quality evaluation on the target data to obtain table-level quality information of the target data, and performing field-level quality evaluation on the target data to obtain field-level quality information of the target data.
In an embodiment, the obtaining module 301 performs table-level quality evaluation on the raw data to obtain table-level quality information of the raw data, and performs field-level quality evaluation on the raw data to obtain field-level quality information of the raw data, specifically including:
performing table-level quality evaluation on the original data according to a first preset index to obtain table-level quality information of the original data, and performing field-level quality evaluation on the original data according to a second preset index to obtain field-level quality information of the original data;
the distributing module 303 performs table-level quality evaluation on the target data to obtain table-level quality information of the target data, and performs field-level quality evaluation on the target data to obtain field-level quality information of the target data, including: and performing table-level quality evaluation on the target data according to the first preset index to obtain table-level quality information of the target data, and performing field-level quality evaluation on the target data according to the second preset index to obtain field-level quality information of the target data.
In one embodiment, the apparatus further comprises:
the determining module is used for determining first abnormal data from the original data according to the quality information of the original data and determining second abnormal data from the target data according to the quality information of the target data;
the display module 305 is also used to: and displaying the first abnormal data and the second abnormal data according to a specified mode.
In one embodiment, the apparatus further comprises:
the quality feedback module is used for feeding back a first data quality evaluation result to the first terminal according to the source information of the first abnormal data; and feeding back a second data quality evaluation result to the second terminal according to the destination information of the second abnormal data.
In one embodiment, the apparatus further comprises:
the query module is used for receiving a data query request sent by a target terminal, wherein the data query request comprises data identification information of data to be queried; inquiring the data blood relationship based on the data identification information to obtain the data to be inquired;
and the data feedback module is used for feeding back the data to be inquired to the target terminal.
It is obvious to those skilled in the art that, for convenience and simplicity of description, the foregoing division of the functional modules is merely used as an example, and in practical applications, the above function distribution may be performed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules to perform all or part of the above described functions. For the specific working process of the functional module, reference may be made to the corresponding process in the foregoing method embodiment, which is not described herein again.
The device of the embodiment of the invention can acquire the original data from each data source, record the source information of the original data, evaluate the quality of the original data and obtain the quality information of the original data; processing and fusing the original data to obtain target data, and recording the data fusion process; distributing target data to each target end, recording destination information of the target data, and evaluating the quality of the target data to obtain quality information of the target data; constructing a data blood relationship according to the source information of the original data, the data fusion process and the destination information of the target data; and displaying the original data, the quality information of the original data, the data fusion process, the target data and the quality information of the target data according to the data blood relationship. The invention can construct the data consanguinity relationship according to the source and the destination of the data, display the data and the data quality according to the constructed data consanguinity relationship, conveniently trace the source and evaluate the value of the data and improve the data treatment effect.
The embodiment of the invention also provides electronic equipment which comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein when the processor executes the program, the data management method provided by any embodiment is realized.
The embodiment of the invention also provides a computer readable medium, on which a computer program is stored, and the computer program is executed by a processor to implement the data management method provided by any of the above embodiments.
Referring now to FIG. 4, a block diagram of a computer system 400 suitable for use with the electronic device implementing an embodiment of the invention is shown. The electronic device shown in fig. 4 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.
As shown in fig. 4, the computer system 400 includes a Central Processing Unit (CPU) 401 that can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM) 402 or a program loaded from a storage section 408 into a Random Access Memory (RAM) 403. In the RAM 403, various programs and data necessary for the operation of the system 400 are also stored. The CPU 401, ROM 402, and RAM 403 are connected to each other via a bus 404. An input/output (I/O) interface 405 is also connected to bus 404.
The following components are connected to the I/O interface 405: an input section 406 including a keyboard, a mouse, and the like; an output section 407 including a display device such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 408 including a hard disk and the like; and a communication section 409 including a network interface card such as a LAN card, a modem, or the like. The communication section 409 performs communication processing via a network such as the internet. A driver 410 is also connected to the I/O interface 405 as needed. A removable medium 411 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 410 as necessary, so that a computer program read out therefrom is mounted into the storage section 408 as necessary.
In particular, according to embodiments of the present disclosure, the processes described above with reference to the flow diagrams may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer-readable medium, the computer program comprising program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 409, and/or installed from the removable medium 411. The computer program performs the above-described functions defined in the system of the present invention when executed by a Central Processing Unit (CPU) 401.
It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules and/or units described in the embodiments of the present invention may be implemented by software, and may also be implemented by hardware. The described modules and/or units may also be provided in a processor, and may be described as: a processor includes an acquisition module, a processing module, a distribution module, a construction module, and a presentation module. Wherein the names of the modules do not in some cases constitute a limitation of the module itself.
As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to comprise:
acquiring original data from each data source, recording source information of the original data, and evaluating the quality of the original data to obtain quality information of the original data; processing and fusing the original data to obtain target data, and recording the data fusion process; distributing target data to each target end, recording destination information of the target data, and evaluating the quality of the target data to obtain quality information of the target data; constructing a data blood relationship according to the source information of the original data, the data fusion process and the destination information of the target data; and displaying the original data, the quality information of the original data, the data fusion process, the target data and the quality information of the target data according to the data blood relationship.
According to the technical scheme of the embodiment of the invention, the original data can be obtained from each data source, the source information of the original data is recorded, and the quality of the original data is evaluated to obtain the quality information of the original data; processing and fusing the original data to obtain target data, and recording the data fusion process; distributing target data to each target end, recording destination information of the target data, and evaluating the quality of the target data to obtain quality information of the target data; constructing a data blood relationship according to the source information of the original data, the data fusion process and the destination information of the target data; and displaying the original data, the quality information of the original data, the data fusion process, the target data and the quality information of the target data according to the data blood relationship. The invention can construct the data consanguinity relationship according to the source and the destination of the data, display the data and the data quality according to the constructed data consanguinity relationship, conveniently trace the source and evaluate the value of the data and improve the data treatment effect.
The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. A method for managing data, comprising:
acquiring original data from each data source, recording source information of the original data, and evaluating the quality of the original data to obtain quality information of the original data;
processing and fusing the original data to obtain target data, and recording a data fusion process;
distributing the target data to each target end, recording the destination information of the target data, and evaluating the quality of the target data to obtain the quality information of the target data;
constructing a data blood relationship according to the source information of the original data, the data fusion process and the destination information of the target data;
and displaying the original data, the quality information of the original data, the data fusion process, the target data and the quality information of the target data according to the data blood relationship.
2. The data management method according to claim 1,
the recording source information of the original data comprises: recording table information and field information of the original data source;
the recording destination information of the target data comprises: and recording table information and field information of the destination of the target data.
3. The data management method according to claim 2,
the evaluating the quality of the original data to obtain the quality information of the original data includes: performing table-level quality evaluation on the original data to obtain table-level quality information of the original data, and performing field-level quality evaluation on the original data to obtain field-level quality information of the original data;
the evaluating the quality of the target data to obtain the quality information of the target data includes: and performing table-level quality evaluation on the target data to obtain table-level quality information of the target data, and performing field-level quality evaluation on the target data to obtain field-level quality information of the target data.
4. The data management method according to claim 3,
the performing table-level quality evaluation on the original data to obtain table-level quality information of the original data, and performing field-level quality evaluation on the original data to obtain field-level quality information of the original data includes: performing table-level quality evaluation on the original data according to a first preset index to obtain table-level quality information of the original data, and performing field-level quality evaluation on the original data according to a second preset index to obtain field-level quality information of the original data;
the performing table-level quality evaluation on the target data to obtain table-level quality information of the target data, and performing field-level quality evaluation on the target data to obtain field-level quality information of the target data includes: and performing table-level quality evaluation on the target data according to the first preset index to obtain table-level quality information of the target data, and performing field-level quality evaluation on the target data according to the second preset index to obtain field-level quality information of the target data.
5. The data management method of claim 1, wherein the method further comprises:
determining first abnormal data from the original data according to the quality information of the original data, and determining second abnormal data from the target data according to the quality information of the target data;
and displaying the first abnormal data and the second abnormal data according to a specified mode.
6. The data management method of claim 5, wherein the method further comprises:
feeding back a first data quality evaluation result to a first terminal according to the source information of the first abnormal data; and
and feeding back a second data quality evaluation result to a second terminal according to the destination information of the second abnormal data.
7. The method of claim 1, further comprising:
receiving a data query request sent by a target terminal, wherein the data query request comprises data identification information of data to be queried;
inquiring the data blood relationship based on the data identification information to obtain the data to be inquired;
and feeding back the data to be inquired to the target terminal.
8. A data management apparatus, comprising:
the acquisition module is used for acquiring original data from each data source, recording source information of the original data, evaluating the quality of the original data and obtaining the quality information of the original data;
the processing module is used for processing and fusing the original data to obtain target data and recording a data fusion process;
the distribution module is used for distributing the target data to each target end, recording the destination information of the target data, evaluating the quality of the target data and obtaining the quality information of the target data;
the construction module is used for constructing a data blood relationship according to the source information of the original data, the data fusion process and the destination information of the target data;
and the display module is used for displaying the original data, the quality information of the original data, the data fusion process, the target data and the quality information of the target data according to the data blood relationship.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the data management method of any of claims 1 to 7 when executing the program.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the data management method according to any one of claims 1 to 7.
CN202211110462.XA 2022-09-13 2022-09-13 Data management method, device, electronic equipment and storage medium Pending CN115481105A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211110462.XA CN115481105A (en) 2022-09-13 2022-09-13 Data management method, device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211110462.XA CN115481105A (en) 2022-09-13 2022-09-13 Data management method, device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115481105A true CN115481105A (en) 2022-12-16

Family

ID=84423630

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211110462.XA Pending CN115481105A (en) 2022-09-13 2022-09-13 Data management method, device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115481105A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117151438A (en) * 2023-10-31 2023-12-01 思创数码科技股份有限公司 Data sharing quality analysis method, system, computer and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117151438A (en) * 2023-10-31 2023-12-01 思创数码科技股份有限公司 Data sharing quality analysis method, system, computer and storage medium

Similar Documents

Publication Publication Date Title
US10475132B1 (en) Computer implemented methods systems and articles of manufacture for identifying tax return preparation application questions based on semantic dependency
US20120072464A1 (en) Systems and methods for master data management using record and field based rules
CN109524070B (en) Data processing method and device, electronic equipment and storage medium
CN111078729B (en) Medical data tracing method, device, system, storage medium and electronic equipment
US20150039600A1 (en) Extensible person container
US11798690B2 (en) Method of using medical data related to patients suffering a given disease
CN106933859B (en) Medical data migration method and device
US20120259661A1 (en) Systems and methods for data mining of DICOM structured reports
CN112506917B (en) Dictionary mapping method, device, system, equipment and medium for main data
CN109360615A (en) A kind of medical resource sharing method, device, equipment and storage medium
CN114830079A (en) Efficient data processing for identifying information and reformatting data files and applications thereof
CN115481105A (en) Data management method, device, electronic equipment and storage medium
CN113254457B (en) Account checking method, account checking system and computer readable storage medium
CN113961719A (en) Family tree construction and query method and system based on graph database
CN111190965A (en) Text data-based ad hoc relationship analysis system and method
CN112965943A (en) Data processing method and device, electronic equipment and storage medium
CN111415138A (en) Creative processing method and system, client and server
CN108573010B (en) Method, device, electronic equipment and medium for associating synonymy data of heterogeneous system
CN114944210A (en) Data processing method and device, computer readable storage medium and electronic equipment
CN115759040A (en) Electronic medical record analysis method, device, equipment and storage medium
CN114595668A (en) Method, platform, medium and equipment for standardizing medical diagnosis terms
CN113190587A (en) Data processing method and device for realizing service data processing
CN111026763A (en) Data processing method, device, equipment and storage medium
Fonseca et al. Primary care coding activity related to the use of online consultation systems or remote consulting: an analysis of 53 million peoples’ health records using OpenSAFELY
CN117827844A (en) Service data processing method, device, electronic equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination