CN117033360A - Bank data restoration method, device, equipment and medium based on blood margin analysis - Google Patents

Bank data restoration method, device, equipment and medium based on blood margin analysis Download PDF

Info

Publication number
CN117033360A
CN117033360A CN202310993023.6A CN202310993023A CN117033360A CN 117033360 A CN117033360 A CN 117033360A CN 202310993023 A CN202310993023 A CN 202310993023A CN 117033360 A CN117033360 A CN 117033360A
Authority
CN
China
Prior art keywords
data
blood
data source
bank
inconsistent
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310993023.6A
Other languages
Chinese (zh)
Inventor
贾俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Bank Co Ltd
Original Assignee
Ping An Bank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Bank Co Ltd filed Critical Ping An Bank Co Ltd
Priority to CN202310993023.6A priority Critical patent/CN117033360A/en
Publication of CN117033360A publication Critical patent/CN117033360A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention discloses a bank data restoration method, a device, equipment and a medium based on blood margin analysis, wherein the method comprises the following steps: acquiring data of inconsistent bank data in a bank system; inputting the data into a blood-edge relation prediction model, wherein the blood-edge relation prediction model outputs the blood-edge relation between a data source of the data and the data; acquiring a data source node with inconsistent data according to the data source and the blood relationship; all the data in the bank data and the data of the data source node are replaced by the data of the preset reference data source. The method realizes the restoration of inconsistent data in the banking system, ensures that the data in the banking system is consistent, reduces the manual intervention during the inconsistent restoration of the data, has high restoration efficiency, ensures that banking business can be normally handled, and improves customer service level and satisfaction.

Description

Bank data restoration method, device, equipment and medium based on blood margin analysis
Technical Field
The invention relates to the technical field of big data, in particular to a bank data restoration method, device, equipment and medium based on blood margin analysis.
Background
With the development of big data technology, databases are applied to more and more industries and fields. The database contains a plurality of data tables, and each data table contains a plurality of fields. Different data tables have different association relations; different fields also have different association relations. In a banking system, in order to meet business requirements, a plurality of tables or temporary tables and intermediate tables are often associated to generate a table, and when business table data are in problem, the data need to be traced to the root or business data need to be intelligently generated through the blood-edge relationship of the data, the blood-edge relationship among the data is particularly important.
With the deep interconnection, more and more bank application systems are adopted, the generated data also rises sharply, and the relationship between the data is also more and more complex. The abnormal situations of inconsistent, incomplete and the like of the data in the banking system influence the normal handling of banking business due to the abnormal situations of different data sources, abnormal data entry, data migration errors, loopholes, improper data processing and the like of the banking system.
Disclosure of Invention
The embodiment of the invention provides a bank data restoration method, device, equipment and medium based on blood margin analysis, which solve the problem that business cannot be handled due to inconsistent data in a bank system.
The embodiment of the invention provides a bank data restoration method based on blood margin analysis, which comprises the following steps:
acquiring data of inconsistent bank data in a bank system;
inputting the data into a blood-edge relation prediction model, wherein the blood-edge relation prediction model outputs the blood-edge relation between a data source of the data and the data;
acquiring a data source node with inconsistent data according to the data source and the blood relationship;
all the data in the bank data and the data of the data source node are replaced by the data of the preset reference data source.
Further, the obtaining, according to the data source and the blood relationship, a data source node where the data of the data is inconsistent, includes:
acquiring data fields and data attributes of the data sources;
acquiring the association and the dependency relationship of the data sources according to the blood relationship;
And acquiring the data source node with inconsistent data according to the data field, the data attribute, the association and the dependency relationship.
Further, the obtaining, according to the data field, the data attribute, the association and the dependency relationship, the data source node where the data of the material is inconsistent includes:
acquiring a transmission process of the data according to the association and the dependency relationship;
comparing the data fields and the data attributes of the expected data output by the blood relationship prediction model with the data on all the data source nodes in the transmission process;
if the data fields are inconsistent or the data attributes change, the data source node is a data source node with inconsistent data.
Further, before replacing all the data in the bank data and the data of the data source node with the data of the preset reference data source, the method further includes:
the method comprises the steps of obtaining a preset reference data source, specifically:
acquiring banking business corresponding to the banking data according to the banking data;
acquiring a business rule of the banking business according to the banking business;
And determining the priority of the data sources according to the business rule, and taking the data source with the highest priority as a preset reference data source.
Further, after replacing all the data in the bank data and the data of the data source node with the data of the preset reference data source, the method further includes:
and monitoring all the data in the replaced bank data and the data of the data source node, and if abnormality occurs, sending abnormality alarm information.
Further, before the inputting the material data into the blood-edge relationship prediction model, the blood-edge relationship prediction model outputs the blood-edge relationship between the data source of the material data and the material data, the method further comprises:
collecting historical data of self-care of banks in a banking system;
marking the blood-margin relation between the data source of the historical data and the historical data;
and inputting the data source of the historical data and the blood relationship into a blood relationship initial model for training, optimizing and testing to obtain a blood relationship prediction model.
Further, before the step of inputting the data source of the historical data and the blood edge relationship into the blood edge relationship initial model for training, optimizing and testing to obtain the blood edge relationship prediction model, after the step of collecting the self-care historical data of the bank in the bank system, the method further comprises:
Labeling expected data of the historical data;
and inputting the historical data and expected data corresponding to the historical data into a blood-edge relationship initial model for training, optimizing and testing.
The embodiment of the invention also provides a bank data restoration device based on blood margin analysis, which comprises:
the first acquisition module is used for acquiring data of inconsistent bank data in the bank system;
the prediction module is used for inputting the data into a blood-edge relation prediction model, and the blood-edge relation prediction model outputs the blood-edge relation between a data source of the data and the data;
the second acquisition module is used for acquiring data source nodes with inconsistent data according to the data source and the blood relationship;
and the repair module is used for replacing all the data in the bank data and the data of the data source node with the data of the preset reference data source.
The embodiment of the invention also provides a bank data repairing device based on blood edge analysis, which comprises a memory and a processor, wherein the memory stores a computer program, and the computer program when executed by the processor causes the processor to execute the following steps:
Acquiring data of inconsistent bank data in a bank system;
inputting the data into a blood-edge relation prediction model, wherein the blood-edge relation prediction model outputs the blood-edge relation between a data source of the data and the data;
acquiring a data source node with inconsistent data according to the data source and the blood relationship;
all the data in the bank data and the data of the data source node are replaced by the data of the preset reference data source.
The embodiment of the invention also provides a computer readable storage medium storing a computer program, which when being executed by a processor, causes the processor to execute the following steps:
acquiring data of inconsistent bank data in a bank system;
inputting the data into a blood-edge relation prediction model, wherein the blood-edge relation prediction model outputs the blood-edge relation between a data source of the data and the data;
acquiring a data source node with inconsistent data according to the data source and the blood relationship;
all the data in the bank data and the data of the data source node are replaced by the data of the preset reference data source.
The embodiment of the invention has the following beneficial effects:
according to the bank data restoration method based on blood margin analysis, inconsistent data are acquired, input into a blood margin relation prediction model to obtain blood margin relation between a data source of the data and the data, further find out a data source node causing the inconsistent data, and then replace the data source node and the data in a bank system with the data of a reference data source. The method realizes the restoration of inconsistent data in the banking system, ensures that the data in the banking system is consistent, reduces the manual intervention during the inconsistent restoration of the data, has high restoration efficiency, ensures that banking business can be normally handled, and improves customer service level and satisfaction. The bank data restoration device, the bank data restoration equipment and the bank data restoration medium based on blood margin analysis can achieve the effects.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Wherein:
FIG. 1 is a diagram of an application environment for bank data remediation based on blood margin analysis provided in one embodiment;
FIG. 2 is a schematic flow chart of a bank data restoration method based on blood margin analysis according to an embodiment of the present invention;
FIG. 3 is a schematic flow chart of a data source node for acquiring inconsistent data according to an embodiment of the present invention;
FIG. 4 is a flowchart of another method for obtaining a data source node with inconsistent data according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a bank data repairing apparatus based on blood margin analysis according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of a bank data repairing apparatus based on blood margin analysis according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of a computer readable storage medium according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
For ease of understanding, the relevant terms to which the present application relates are first described below.
(1) The blood margin analysis is a means for ensuring data fusion (aggregation), and the traceability of the data fusion processing is realized through the blood margin analysis. For a comprehensive tracking of the data processing process to find all relevant metadata objects starting from a certain data object and the relationships between these metadata objects. Relationships between metadata objects are specifically data stream input-output relationships that represent these metadata objects.
(2) The banking system, also called banking system, refers to the financial institution itself and its management organization's composition setting and function system, which is an item using python and sqlite3 databases.
(3) The bank data refers to the collection of data collected by the bank system in the business handling process.
As shown in FIG. 1, an application environment diagram of bank data remediation based on blood margin analysis is provided in one embodiment. Referring to fig. 1, the bank data restoration method based on blood margin analysis is applied to a banking system. The banking system comprises a server 200 and a cluster of terminal devices, which may comprise one or more terminal devices, the number of terminal devices being in this embodiment not limited. As shown in fig. 1, the plurality of terminal devices may specifically include a terminal device 1, a terminal device 2, a terminal device 3, …, a terminal device n; the terminal device 1, the terminal device 2, the terminal devices 3, … and the terminal device n are all connected with the server 200 through the network 300, so that each terminal device can perform data interaction with the server 200 through the network 300.
Terminal device 1, terminal device 2, terminal devices 3, …, terminal device n may be: intelligent terminals such as smart phones, tablet computers, notebook computers, desktop computers, intelligent televisions and the like. The server 200 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs, basic cloud computing services such as big data and artificial intelligence platforms, and the like. The network 300 may be a wired network or a wireless network. In some embodiments of the present invention, the wired or wireless network described above uses standard communication techniques and/or protocols. The network may be the Internet, but may be any network including, but not limited to, a wide area network, a metropolitan area network, a regional network, a third generation partnership project (3rd Generation Partnership Project,3GPP), a long term evolution (Long Term Evolution LTE), a worldwide interoperability for microwave access (Worldwide Interoperability for Microwave Access WiMAX), or a computer network communication based on the TCP/IP protocol family (TCP/IPProtocol Suite TCP/IP), the user datagram protocol (User Datagram Protocol UDP), or the like.
The terminal device 1, the terminal device 2, the terminal devices 3, … and the terminal device n can be used for collecting data in real time and transmitting the collected data to the server 200, wherein the server 200 inputs the data into a blood-edge relationship prediction model, and the blood-edge relationship prediction model outputs a data source of the data and a blood-edge relationship between the data; acquiring a data source node with inconsistent data according to the data source and the blood relationship; all the data in the bank data and the data of the data source node are replaced by the data of the preset reference data source.
As shown in fig. 2, a flow chart of a bank data repairing method based on blood margin analysis according to an embodiment of the present invention is shown, where the method includes:
step S101, acquiring data of inconsistent bank data in a bank system;
specifically, in this embodiment, data, such as customer data, is extracted from each data source (e.g., a bank core system, a CRM system, a risk control system, etc.) in a bank system, and the extracted customer data is compared, first, fields to be compared in the customer data are selected, where the fields can uniquely identify key fields of a data record, and when the comparison fields are selected, appropriate comparison fields, such as a customer ID, a name, an identification card number, a residence address, a financial condition, etc., can be selected according to service requirements and characteristics of a data set. Then, selecting a comparison algorithm, comparing the selected fields to be compared according to the selected comparison algorithm, traversing each field in the customer data, and if the same field is inconsistent in the comparison process, indicating that the data are inconsistent, for example, in the embodiment, the record of the internal system in the bank system displays that the ID card number of the customer is inconsistent with the records of other systems.
It should be noted that, after finding that the data is inconsistent, the data may be marked, for example, by marking the data of the inconsistent data, to determine how many client data records have differences between the two data sources; inconsistent fields or attributes, determining which specific fields or attributes differ in the inconsistent records; the type and description of the difference describe the field or attribute of inconsistent occurrence, and indicate the specific type and cause of the difference, such as missing value, abnormal value, inconsistent format, etc.; the influence range of the difference is evaluated, the influence of the difference on the whole data is evaluated, and whether the difference affects the consistency of other related fields or attributes is determined; and determining the priority of the difference repair according to the service requirement and the data importance so as to formulate a repair scheme and allocate resources.
It should be noted that, in the banking system, data of different data sources are integrated to create a unified customer profile data set. When creating a data set of material, the data source and field information may be marked, and an identifier or metadata may be added to each data source and field to track the source and change of the data, which may include information such as the source name, table name, field name, data extraction time, etc. Before field comparison, the extracted data is cleaned and preprocessed, such as deduplication, filtering, missing value filling and the like, due to diversity of data sources, inconsistency of data formats, and abnormal values and missing values in the data, so that the extracted data accords with the unified data format and specification. Wherein, the duplicate removal is to remove duplicate data records according to the designated fields or attributes; filtering is to screen the data according to the conditions and eliminate the data which does not meet the requirements; missing value padding is to pad missing values for fields where missing values exist according to predefined rules or using a suitable padding method. The cleaning and preprocessing of the extracted data may be accomplished by programming or using associated data processing tools.
It should be noted that, when selecting the comparison algorithm, the selection is performed according to the size and the data complexity of the data amount of the comparison field. Common alignment algorithms include: (1) The accurate matching is to compare whether the field values are completely equal one by one; the method is suitable for the conditions of smaller data volume and consistent field value specification. (2) Fuzzy matching, namely measuring the similarity degree between field values by using a fuzzy matching algorithm (such as editing distance, similarity algorithm and the like); it is applicable in cases where there is some difference in field values but still a match can be considered. (3) Rule matching, defining a self-defined matching rule according to the business rule, and comparing field values; the method is applicable to the situation that the specific business logic needs to be considered. (4) The machine learning algorithm is used for carrying out data comparison by utilizing a machine learning model, and the model is trained to learn the relevance and the matching rule among the fields; is suitable for complex data comparison scenes, but requires enough training data.
It should be noted that there are many situations of inconsistent data in the client data, and the situations are common in the first place, and (1) the missing value, such as a field existing in a certain data source, has no corresponding value or is null in another data source. (2) Outliers, such as field values in a certain data source are out of the expected range or do not meet predefined rules. (3) Format inconsistencies, such as a field using one format representation in one data source and a different format representation in another data source, e.g., date format, currency format, etc. (4) A data type mismatch, such as a field being interpreted as one data type in one data source and another data type in another data source, e.g. a string being interpreted as a number. (5) Data precision is inconsistent, as a field has a higher precision or decimal number in one data source and a lower precision or decimal number in another data source.
It should be noted that the data source may include a database, a file system, an API interface, a third party service, and the like. In extracting data from a data source, the data source from which the data is to be extracted, such as a bank's different system, database, log file, etc., is first determined. Then, according to the different data sources, selecting a data extraction mode and an extraction tool suitable for the data sources, wherein the common data extraction modes comprise: (1) database extraction: data is extracted from the database using a database connection tool or a query language (e.g., SQL). (2) file system extraction: data is obtained from the file system by file transfer protocols (e.g., FTP, SFTP) or by reading the file directly. (3) API extraction: and acquiring data by calling an API interface. (4) log file extraction: analyzing the log file and extracting the required data. Depending on the selected data extraction mode, the data extraction is implemented using a corresponding tool or programming language, such as SQL query statement, python programming, ETL tool, etc.
In extracting data from a data source, the process may include extracting task schedules, data batch processing, incremental extraction, and the like. Before data extraction, verification and testing are performed to ensure that the extracted data is accurate and complete. In practical application, when data is extracted from a data source, the data can be adjusted and optimized according to specific conditions and service requirements, so that effective extraction of the data is ensured.
In the banking system, the data of the customer includes personal identity information such as customer name, sex, date of birth, nationality, residence address, contact information, etc.; employment related information such as the professional type of the customer, the name of the company or organization where the customer is located, the job position, the working years and the like, financial information such as income level, liability condition, asset condition (such as real estate, vehicles, investment and the like), bank deposit, investment combination and the like of the customer; the method comprises the steps of including bank account information such as a bank account number, an account opening row, an account type and the like of a customer; credit history, credit score, credit report, etc. of the customer and credit information; consumption behavior and preference information of customers such as consumption habits, shopping preferences, consumption frequency, consumption amount, and the like.
It should be noted that, the reasons for inconsistent data are relatively large, and the following listed common reasons are (1) human factors: the data provided by the clients are wrong, the banking staff inputs the data wrong, the system is not operated properly, and the like; (2) The data sources are different, and the banking system may collect customer information from multiple channels, such as website, telephone, website, etc.; (3) Data migration errors, and errors may occur in the data migration process of the bank system, so that customer data are inconsistent; (4) The system problem is that the banking system may have loopholes or errors, so that the customer data is inconsistent; (5) When the data is processed improperly, the bank may delete the customer data by mistake and modify the customer data by mistake, so that the data are inconsistent.
Step S102, inputting the data into a blood-edge relation prediction model, wherein the blood-edge relation prediction model outputs the blood-edge relation between a data source of the data and the data;
specifically, in this embodiment, inconsistent customer data is input into the blood-edge relationship prediction model, the blood-edge relationship prediction model outputs a drawn blood-edge relationship graph between the data, and the relationship and the dependency relationship between the data are displayed, so as to obtain the evolution process of the data or identify the reason of the data change, such as a data source, a data table, a field, and the relationship and the dependency relationship between the data.
If inconsistent data appear in the customer data are the identification card numbers of the customers, the identification card numbers extracted from all the identification card number data sources (such as the identification card copies provided by the customers, the bank system and the identification card information in other systems) are input into a blood-edge relationship prediction model, and a blood-edge relationship diagram among the customer identification card numbers is obtained.
Step S103, according to the data source and the blood relationship, acquiring a data source node with inconsistent data;
specifically, in this embodiment, the data on each data source is compared with the expected data on the data source, in the process of comparison, the data can be compared with the expected data one by one, whether the data is consistent with the expected data is checked, specifically, when the data is compared, the field comparison can be performed, the value of each field or attribute is compared, and the difference between the fields or the attribute is checked; data comparison is carried out, data on a data source is compared with expected data on the data source, and the difference between the two data sources is obtained; the field values may be compared, and a comparison operator (e.g., equal to, unequal to, greater than, less than, etc.) or a string comparison function (e.g., similarity calculation) may be used to compare the field values; data type comparison, checking whether the data types of the fields or attributes are consistent, such as one field being a text type in one data source and a numeric type in another data source; data format comparison, checking whether the data formats of the fields or attributes are consistent, such as the date field using a "yyyy-mm-dd" format in one data source and a "mm/dd/yyyyy" format in another data source; comparing the data ranges, and checking whether the data ranges of the fields or the attributes are consistent or not, for example, if one field has a value range of 1-100 in one data source and has a value range of 1-50 in the other data source; business rule comparison, which fields or attributes should have consistent values under specific conditions, such as the age of customers should be within a specific range according to banking regulations, and if not, inconsistencies may occur; data quality assessment, such as incomplete and inaccurate fields or attributes, etc.
And obtaining the changed fields or attributes in the data of the data through comparison, wherein the fields or attributes comprise data formats, data types, data ranges, data precision and the like. The comparison results can be visualized and report generated so as to better understand and display the data difference condition. In performing visualization and report generation, charts, tables, or other visualization may be used to present the difference results and generate reports for further analysis and decision-making.
The reason for inconsistent customer identification card numbers is that the data entry link is that the bank staff has errors when entering the customer identification card numbers.
Step S104, replacing all the data in the bank data and the data of the data source node with the data of the preset reference data source.
Specifically, in this embodiment, after identifying the data source node where the data is inconsistent, the data source data of the reference data source may replace all the data in the bank data and the data source node data, or may replace the data on all the data sources that provide the data with the reference data source. When the data is replaced, the data of the bank system and the data of each data source can be directly replaced by the data of the reference data source based on the predefined rule or condition. Or according to the reason of the inconsistency, replacing the data at the place of the inconsistency with the corresponding data of the reference data source, which causes the reason.
When the data of the reference data source is acquired, the credibility and the data quality of the data source are evaluated, the data in the data source with higher data quality and higher credibility is selected as the data of the reference data source, the data of which data source is more in line with the actual situation or the business rule is also determined according to the business requirement, and the data in the data source which meets the business requirement is selected as the data of the reference data source; the data in the data source with higher data value can be selected as the data of the reference data source according to the importance and influence of the data; the updating frequency and the real-time performance of the data can be considered, and the data in the data source with better updating performance can be selected as the data of the reference data source.
It should be noted that the desired data on the data source may be the desired data on the data source.
It should be noted that, based on the bank data repairing method based on blood edge analysis, the consistency of the identification data is not limited to the situation of collecting the data in different channels of the same business, but also can be applied to inconsistent repairing and integrating of the data when the bank system collects the customer data through a plurality of channels or systems so as to maintain the consistency of the data; the system can also be used for updating the customer data in each data source when the customer data is changed, so that the data in the bank system are consistent when the customer data is changed; and the method can also be used for repairing inconsistent data when the customer data in different systems are required to be integrated and aligned in the migration or merging process of the banking system.
According to the bank data restoration method based on the blood margin analysis, inconsistent data are obtained, the inconsistent data are input into the blood margin relation prediction model, the blood margin relation between a data source of the data and the data is obtained, further, a data source node causing the inconsistent data is found, then the data source node and the data in a bank system are replaced by the data of the reference data source at the same time, restoration of the inconsistent data in the bank system is achieved, the data in the bank system is kept consistent, manual intervention during inconsistent restoration of the data is reduced, restoration efficiency is high, normal handling of bank business is guaranteed, and accordingly customer service level and satisfaction are improved.
In some embodiments, as shown in fig. 3, a flow chart of a data source node for acquiring inconsistent data in an embodiment of the present invention is shown, and step S103 includes:
step S1031, obtaining data fields and data attributes of the data of the material on all data sources;
step S1032, obtaining the association and the dependency relationship of the data sources according to the blood relationship;
Step S1033, obtaining the data source node with inconsistent data according to the data field, the data attribute, the association and the dependency relationship.
As shown in fig. 4, a flow chart of another data source node for acquiring inconsistent data according to an embodiment of the present invention is shown, where the step 1033 includes:
step S10331, according to the association and the dependency relationship, acquiring a transmission process of the data;
step S10332, comparing the data fields and data attributes of the expected data output by the blood relationship prediction model with the data on all the data source nodes in the transmission process;
step S10333, if the data fields are inconsistent or the data attributes change, the data source node is a data source node with inconsistent data.
Specifically, in this embodiment, the source and the flow direction of the data on each data source are obtained according to the blood relationship between the data sources, and then the data fields and the data attributes of the data on adjacent nodes are compared one by one, so as to obtain the data source node with inconsistent data.
In some embodiments, before the step S104, the method further includes:
The method comprises the steps of obtaining a preset reference data source, specifically:
acquiring banking business corresponding to the banking data according to the banking data;
acquiring a business rule of the banking business according to the banking business;
and determining the priority of the data sources according to the business rule, and taking the data source with the highest priority as a preset reference data source.
Specifically, in this embodiment, when a reference data source is selected, firstly, banking business corresponding to data is obtained according to data in which the data is inconsistent, then, business rules of the banking business are obtained according to the banking business, and finally, the data provided by each data source are rated by combining the business rules, data reliability, data quality, data integrity and the like, and the comprehensive result is selected as the reference data source optimally.
It should be noted that, in different banks or different banking businesses, the business rules are different, and the business rules in this embodiment include (1) data integrity, so as to ensure that necessary fields and attributes in the client data record have values, and no missing or null value exists; (2) The data accuracy, the field value in the customer data record should be consistent with the value of the trusted data source or the reference data source; (3) The data consistency, the field value in the customer data record should be kept consistent in the whole system, and no conflict or contradiction occurs between the field value and other related data records; (4) The data format specification, the field value in the customer data record should meet the predefined format requirement, such as date format, telephone number format, etc.; (5) The compliance of business rules, the field values in the customer profile records should meet the business rules and regulatory requirements of the bank, such as authentication rules, risk assessment rules, etc.
In some embodiments, after the step S104, the method further includes:
and monitoring all the data in the replaced bank data and the data of the data source node, and if abnormality occurs, sending abnormality alarm information.
Specifically, in this embodiment, the repaired data is monitored to obtain whether an abnormality occurs in the modification process, which specifically includes: and randomly extracting a part of samples from the repaired data for auditing, wherein the auditing can be performed by comparing the repair result with the data of the trusted data source or verifying according to business rules and knowledge, and the accuracy of repair and whether the expected repair result is met or not can be evaluated through sampling and auditing, if the repair result is not met or the accuracy is insufficient, the abnormal condition occurs in the repair process, and an alarm is sent out.
For the repair process, regular data quality reports can be generated, including data quality indexes before and after repair, variation trend of inconsistent records, repair effect evaluation and the like, and data statistics, visual charts, summary indexes and the like can be included to help evaluate the repair effect and provide feedback.
And continuously monitoring and tracking the repaired data to ensure the stability and the continuity improvement of the repair result.
In some embodiments, prior to step S102, the method further comprises:
collecting historical data of self-care of banks in a banking system;
marking the blood-margin relation between the data source of the historical data and the historical data;
and inputting the data source of the historical data and the blood relationship into a blood relationship initial model for training, optimizing and testing to obtain a blood relationship prediction model.
Specifically, in this embodiment, first, data appearing in a banking system during handling banking business is collected, the data is analyzed, features and structures of the data are obtained, including data fields, attributes of the data, association relationships between data sources of the data and the data, and the like, and these features and structures are labeled, and when labeling, labeling may be performed manually, or automatic labeling may be performed based on some rules and algorithms, such as matching and inference based on data field names, data structures, and the like. The granularity of modeling the data blood-edge relationship, i.e., the minimum processing unit for determining the data, can then be a field level or a record level, based on the traffic demand and the data characteristics. Selecting a proper data blood edge modeling algorithm according to the complexity of the data characteristics and the blood edge relation; common algorithms include graph theory-based algorithms, timestamp-based algorithms, rule-based algorithms, and the like. And according to the selected algorithm, establishing a data blood relationship model of the client data, and recording the source and relationship of the data. The blood relationship may be represented using a graph structure, a relationship table, or other suitable means. And finally, training, testing and optimizing the model by taking the collected data as the input of the model and the blood-edge relationship among the data and the data source as the output to obtain the blood-edge relationship prediction model with the optimal effect.
In some embodiments, prior to step S102, the method further comprises:
labeling expected data of the historical data;
and inputting the historical data and expected data corresponding to the historical data into a blood-edge relationship initial model for training, optimizing and testing.
Specifically, in this embodiment, the collected historical data is labeled, the expected value corresponding to the historical data is labeled, the historical data is taken as input, the expected data is taken as output, the blood-edge relationship prediction model is trained, and when the blood-edge relationship prediction model is applied, the expected data corresponding to the data can be output.
Fig. 5 is a schematic structural diagram of a bank data repairing device based on blood margin analysis according to an embodiment of the present invention, where the device includes:
a first obtaining module 501, configured to obtain data in which bank data is inconsistent in a banking system;
a prediction module 502, configured to input the data into a blood-edge relationship prediction model, where the blood-edge relationship prediction model outputs a blood-edge relationship between a data source of the data and the data;
a second obtaining module 503, configured to obtain, according to the data source and the blood edge relationship, a data source node where the data of the data is inconsistent;
The repairing module 504 is configured to replace all the data in the bank data and the data of the data source node with the data of the preset reference data source.
In some embodiments, the second acquisition module is further to:
acquiring data fields and data attributes of the data sources;
acquiring the association and the dependency relationship of the data sources according to the blood relationship;
and acquiring the data source node with inconsistent data according to the data field, the data attribute, the association and the dependency relationship.
In some embodiments, the second acquisition module is further to:
acquiring a transmission process of the data according to the association and the dependency relationship;
comparing the data fields and the data attributes of the expected data output by the blood relationship prediction model with the data on all the data source nodes in the transmission process;
if the data fields are inconsistent or the data attributes change, the data source node is a data source node with inconsistent data.
In some embodiments, the apparatus further comprises: the third obtaining module is configured to obtain a preset reference data source, and specifically includes:
Acquiring banking business corresponding to the banking data according to the banking data;
acquiring a business rule of the banking business according to the banking business;
and determining the priority of the data sources according to the business rule, and taking the data source with the highest priority as a preset reference data source.
In some embodiments, the apparatus further comprises: and the monitoring module is used for monitoring all the data in the replaced bank data and the data of the data source node, and sending abnormal alarm information if abnormality occurs.
In some embodiments, the apparatus further comprises: the training module is used for collecting historical data of self-care of banks in the bank system;
marking the blood-margin relation between the data source of the historical data and the historical data;
and inputting the data source of the historical data and the blood relationship into a blood relationship initial model for training, optimizing and testing to obtain a blood relationship prediction model.
In some embodiments, the training module is further to:
labeling expected data of the historical data;
and inputting the historical data and expected data corresponding to the historical data into a blood-edge relationship initial model for training, optimizing and testing.
For other details of implementing the above technical solution by each module in the bank data repairing device based on blood margin analysis, reference may be made to the description in the bank data repairing method based on blood margin analysis provided above, and the description is omitted here.
In some embodiments, as shown in fig. 6, a schematic structural diagram of a bank data repairing apparatus based on blood edge analysis according to an embodiment of the present invention includes a memory 601 and a processor 602, where the memory 601 stores a computer program, and when the computer program is executed by the processor 602, the processor 602 performs the following steps:
acquiring data of inconsistent bank data in a bank system;
inputting the data into a blood-edge relation prediction model, wherein the blood-edge relation prediction model outputs the blood-edge relation between a data source of the data and the data;
acquiring a data source node with inconsistent data according to the data source and the blood relationship;
all the data in the bank data and the data of the data source node are replaced by the data of the preset reference data source.
For further details regarding implementation of the above technical solution by the processor 602 in the bank data restoration device based on blood margin analysis, reference may be made to the description in the bank data restoration method based on blood margin analysis provided above, and details are not repeated here.
Wherein the processor 602 may also be referred to as a CPU (Central Processing Unit ), the processor 602 may be an integrated circuit chip with signal processing capability; the processor 602 may also be a general purpose processor, such as a microprocessor or the processor 602 may be any conventional processor, a DSP (Digital Signal Process, digital signal processor), ASIC (Application Specific Integrated Circuit ), FPGA (Field Programmable Gata Array, field programmable gate array) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like.
In some embodiments, as shown in fig. 7, a schematic structural diagram of a computer readable storage medium according to an embodiment of the present invention is provided, where a readable computer program 701 is stored on the storage medium; wherein the computer program 701 may be stored in the above-mentioned storage medium in the form of a software product, comprising instructions for causing a computer device (which may be a personal computer, a service machine, or a network device, etc.) or a processor (processor) to perform the following steps:
Acquiring data of inconsistent bank data in a bank system;
inputting the data into a blood-edge relation prediction model, wherein the blood-edge relation prediction model outputs the blood-edge relation between a data source of the data and the data;
acquiring a data source node with inconsistent data according to the data source and the blood relationship;
all the data in the bank data and the data of the data source node are replaced by the data of the preset reference data source.
And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a magnetic or optical disk, a ROM (Read-Only Memory), a RAM (Random Access Memory), or a terminal device such as a computer, a service machine, a mobile phone, or a tablet.
Those skilled in the art will appreciate that all or part of the processes in the methods of the above embodiments may be implemented by a computer program for instructing relevant hardware, where the program may be stored in a non-volatile computer readable storage medium, and where the program, when executed, may include processes in the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The foregoing examples illustrate only a few embodiments of the application and are described in detail herein without thereby limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of protection of the present application is to be determined by the appended claims.

Claims (10)

1. A bank data repair method based on blood margin analysis, the method comprising:
acquiring data of inconsistent bank data in a bank system;
inputting the data into a blood-edge relation prediction model, wherein the blood-edge relation prediction model outputs the blood-edge relation between a data source of the data and the data;
Acquiring a data source node with inconsistent data according to the data source and the blood relationship;
all the data in the bank data and the data of the data source node are replaced by the data of the preset reference data source.
2. The method for repairing banking materials based on blood edge analysis as claimed in claim 1, wherein the step of obtaining data source nodes where the data of the materials are inconsistent according to the data source and the blood edge relationship comprises:
acquiring data fields and data attributes of the data sources;
acquiring the association and the dependency relationship of the data sources according to the blood relationship;
and acquiring the data source node with inconsistent data according to the data field, the data attribute, the association and the dependency relationship.
3. The method for repairing banking materials based on blood edge analysis as claimed in claim 2, wherein the step of obtaining data source nodes where the data of the materials are inconsistent according to the data fields, data attributes, associations and dependencies comprises:
acquiring a transmission process of the data according to the association and the dependency relationship;
Comparing the data fields and the data attributes of the expected data output by the blood relationship prediction model with the data on all the data source nodes in the transmission process;
if the data fields are inconsistent or the data attributes change, the data source node is a data source node with inconsistent data.
4. The method of claim 1, wherein before replacing all of the data in the bank data and the data of the data source node with data of a predetermined reference data source, the method further comprises:
the method comprises the steps of obtaining a preset reference data source, specifically:
acquiring banking business corresponding to the banking data according to the banking data;
acquiring a business rule of the banking business according to the banking business;
and determining the priority of the data sources according to the business rule, and taking the data source with the highest priority as a preset reference data source.
5. The method of claim 1, wherein after replacing all the data in the bank data and the data of the data source node with the data of the predetermined reference data source, the method further comprises:
And monitoring all the data in the replaced bank data and the data of the data source node, and if abnormality occurs, sending abnormality alarm information.
6. A bank profile restoration method based on a blood edge analysis according to claim 1, wherein before said inputting said profile data into a blood edge relation prediction model, said blood edge relation prediction model outputs a blood edge relation between a data source of said profile data and said profile data, said method further comprises:
collecting historical data of self-care of banks in a banking system;
marking the blood-margin relation between the data source of the historical data and the historical data;
and inputting the data source of the historical data and the blood relationship into a blood relationship initial model for training, optimizing and testing to obtain a blood relationship prediction model.
7. The method for repairing banking materials based on blood edge analysis according to claim 6, wherein after collecting the historical data of banking self-care in a banking system before training, optimizing and testing the data source of the historical data and the blood edge relationship input into the blood edge relationship initial model to obtain a blood edge relationship prediction model, the method further comprises:
Labeling expected data of the historical data;
and inputting the historical data and expected data corresponding to the historical data into a blood-edge relationship initial model for training, optimizing and testing.
8. A bank data retrieval device based on blood margin analysis, the device comprising:
the first acquisition module is used for acquiring data of inconsistent bank data in the bank system;
the prediction module is used for inputting the data into a blood-edge relation prediction model, and the blood-edge relation prediction model outputs the blood-edge relation between a data source of the data and the data;
the second acquisition module is used for acquiring data source nodes with inconsistent data according to the data source and the blood relationship;
and the repair module is used for replacing all the data in the bank data and the data of the data source node with the data of the preset reference data source.
9. A banking data repair device based on blood edge analysis, comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of the method according to any one of claims 1 to 7.
10. A computer readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of the method of any one of claims 1 to 7.
CN202310993023.6A 2023-08-08 2023-08-08 Bank data restoration method, device, equipment and medium based on blood margin analysis Pending CN117033360A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310993023.6A CN117033360A (en) 2023-08-08 2023-08-08 Bank data restoration method, device, equipment and medium based on blood margin analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310993023.6A CN117033360A (en) 2023-08-08 2023-08-08 Bank data restoration method, device, equipment and medium based on blood margin analysis

Publications (1)

Publication Number Publication Date
CN117033360A true CN117033360A (en) 2023-11-10

Family

ID=88644178

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310993023.6A Pending CN117033360A (en) 2023-08-08 2023-08-08 Bank data restoration method, device, equipment and medium based on blood margin analysis

Country Status (1)

Country Link
CN (1) CN117033360A (en)

Similar Documents

Publication Publication Date Title
CN108509485B (en) Data preprocessing method and device, computer equipment and storage medium
US11182394B2 (en) Performing database file management using statistics maintenance and column similarity
US20150356094A1 (en) Systems and methods for management of data platforms
US20130226623A1 (en) Insurance claims processing
WO2016141491A1 (en) Systems and methods for managing data
US20210112101A1 (en) Data set and algorithm validation, bias characterization, and valuation
US20120179658A1 (en) Cleansing a Database System to Improve Data Quality
CN106164896B (en) Multi-dimensional recursion method and system for discovering counterparty relationship
CN110674360B (en) Tracing method and system for data
CN110362607B (en) Abnormal number identification method, device, computer equipment and storage medium
Utamachant et al. An analysis of high-value datasets: a case study of Thailand’s open government data
CN112000656A (en) Intelligent data cleaning method and device based on metadata
US20230092559A1 (en) Systems and methods for unstructured data processing
van Cruchten et al. Process mining in logistics: The need for rule-based data abstraction
CN113868498A (en) Data storage method, electronic device, device and readable storage medium
CN113837584B (en) Service processing system and abnormal data processing method based on service processing system
US11816112B1 (en) Systems and methods for automated process discovery
US11790680B1 (en) System and method for automated selection of best description from descriptions extracted from a plurality of data sources using numeric comparison and textual centrality measure
US20190294594A1 (en) Identity Data Enhancement
Sáinz-Pardo Díaz et al. A Python library to check the level of anonymity of a dataset
US11227288B1 (en) Systems and methods for integration of disparate data feeds for unified data monitoring
CN117076770A (en) Data recommendation method and device based on graph calculation, storage value and electronic equipment
US9891968B2 (en) Analyzing data sources for inactive data
US20230099164A1 (en) Systems and methods for automated data quality semantic constraint identification using rich data type inferences
US20220405235A1 (en) System and method for reference dataset management

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination