CN118093698A - Data processing method, device, electronic equipment and computer storage medium - Google Patents

Data processing method, device, electronic equipment and computer storage medium Download PDF

Info

Publication number
CN118093698A
CN118093698A CN202410219049.XA CN202410219049A CN118093698A CN 118093698 A CN118093698 A CN 118093698A CN 202410219049 A CN202410219049 A CN 202410219049A CN 118093698 A CN118093698 A CN 118093698A
Authority
CN
China
Prior art keywords
data
source
data source
distributed
sub
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410219049.XA
Other languages
Chinese (zh)
Inventor
刘旺森
赵吉昆
赵同
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202410219049.XA priority Critical patent/CN118093698A/en
Publication of CN118093698A publication Critical patent/CN118093698A/en
Pending legal-status Critical Current

Links

Landscapes

  • Testing Resistance To Weather, Investigating Materials By Mechanical Methods (AREA)

Abstract

The disclosure provides a data processing method, a data processing device, electronic equipment and a computer storage medium, which can be applied to the technical fields of big data and financial science and technology. The method comprises the following steps: in response to detecting that the second data source is abnormal, sending a first data table of the first data source to a downstream application of the data lake, wherein the first data source is a centralized data source, and the second data source is a distributed data source; updating first data and second data in a first data table to a distributed anti-corrosion data table, wherein the first data is historical interaction data between a first data source and a downstream application, and the second data is newly added interaction data between the first data source and the downstream application during the abnormal period of the second data source; and in response to detecting that the second data source is restored to normal, sending the distributed corrosion protection data table to a downstream application of the data lake.

Description

Data processing method, device, electronic equipment and computer storage medium
Technical Field
The present disclosure relates to the technical field of big data and financial science and technology, and in particular, to a data processing method, a device, an electronic apparatus, and a computer storage medium.
Background
With the continuous development of computer technology, more and more data is involved in providing services to users via the internet. The data lake acts as a centralized repository that can store data from multiple data sources and provide the data from the multiple data sources stored by the data lake to downstream applications.
Currently, to optimize system architecture, enterprises are increasingly converting centralized systems to distributed systems, whereby data sources of data lakes are also converted from centralized data sources to distributed data sources. When an abnormality occurs in the optimization process, service continuity is maintained by switching the distributed data sources back to the centralized data source.
However, in the process of implementing the above inventive concept, the inventor found that the related art has the following technical problems: in a scene of excessively long exception processing time, the service can be continuously provided after the data is smoothed, and the continuity of the service is affected.
Disclosure of Invention
In view of the foregoing, the present disclosure provides a data processing method, apparatus, electronic device, and computer storage medium.
According to a first aspect of the present disclosure, there is provided a data processing method, the method comprising: in response to detecting that the second data source is abnormal, sending a first data table of a first data source to a downstream application of the data lake, wherein the first data source is a centralized data source and the second data source is a distributed data source; updating first data and second data in the first data table to a distributed corrosion-resistant data table, wherein the first data is historical interaction data between the first data source and the downstream application, and the second data is newly added interaction data between the first data source and the downstream application during the abnormal period of the second data source; and in response to detecting that the second data source is restored to normal, sending the distributed corrosion protection data table to a downstream application of the data lake.
According to the embodiment of the disclosure, third data during normal period of the second data source is also stored in the distributed corrosion-resistant data table, wherein the third data is obtained from the second data table of the second data source; updating the first data and the second data in the first data table to the distributed corrosion protection data table comprises: determining fourth data to be stored to the distributed corrosion prevention data table based on the first data and the third data; and storing the second data and the fourth data to a distributed corrosion protection data table.
According to an embodiment of the present disclosure, determining fourth data to be stored to the distributed corrosion protection data table based on the first data and the third data includes: determining first sub-data having a target data type from the first data, wherein the target data type includes at least one of: the account identification type, the product type and the bank card type, and the third data does not comprise the first sub-data; the first sub data is taken as fourth data.
According to an embodiment of the present disclosure, further comprising: determining second sub-data of a target main key from the first data and determining third sub-data of the target main key from the third data, wherein the target main key represents the main key included in both the first data and the third data; and in the case where the second sub data and the third sub data are different, taking the second sub data as fourth data.
According to an embodiment of the present disclosure, further comprising: comparing the first data with the third data to obtain a difference result between the first data and the third data, wherein the difference result comprises fourth sub-data only contained in the first data; and taking the fourth sub-data in the first data as fourth data.
In accordance with an embodiment of the present disclosure, in response to detecting that the second data source is restored to normal, sending the distributed corrosion protection data table to the downstream application includes: and in response to detecting that the second data source is restored to be normal, modifying the data source parameters of the data lake into the second data source in real time so as to send the distributed anti-corrosion data table to a downstream application in real time.
According to an embodiment of the disclosure, the first data table and the distributed corrosion protection data table have different data structures, and the second data table and the distributed corrosion protection data table have the same data structure; storing the second data and the fourth data to a distributed corrosion protection data table includes: converting the data structures of the second data and the fourth data into the data structures of the distributed anti-corrosion data table based on a predetermined data structure conversion rule; and storing the second data and the fourth data after the data structure is converted into a value distributed anti-corrosion data table.
A second aspect of the present disclosure provides a data processing apparatus, the apparatus comprising: the first sending module is used for responding to the detection that the second data source is abnormal, and sending a first data table of the first data source to downstream application of the data lake, wherein the first data source is a centralized data source, and the second data source is a distributed data source; the updating module is used for updating first data and second data in the first data table to the distributed anti-corrosion data table, wherein the first data is historical interaction data between the first data source and the downstream application, and the second data is newly added interaction data between the first data source and the downstream application during the abnormal period of the second data source; and the second sending module is used for responding to the detection that the second data source is recovered to be normal and sending the distributed anti-corrosion data table to the downstream application. .
A third aspect of the present disclosure provides an electronic device, comprising: one or more processors; and a memory for storing one or more computer programs which when executed by the one or more processors perform the steps according to the data processing method described above.
A fourth aspect of the present disclosure also provides a computer-readable storage medium having stored thereon a computer program or instructions which, when executed by a processor, implement steps according to the data processing method described above.
A fifth aspect of the present disclosure also provides a computer program product comprising a computer program or instructions which, when executed by a processor, implement the steps of a data processing method according to the above.
In the embodiment of the disclosure, since the distributed anti-corrosion data table synchronously records the first data and the second data between the first data source and the downstream application during the abnormal period of the second data source, after the second data source is recovered to be normal, by sending the distributed anti-corrosion data table to the downstream application of the data lake, services can be provided for the downstream application directly based on the second data source, and the second data table of the second data source does not need to wait for the data smoothing operation to be performed. Therefore, the embodiment of the disclosure can switch back to the second data source in real time after the exception processing is completed, so that the technical problems of long data source switching time and low efficiency caused by overlong exception processing time are avoided, services can be continuously provided without waiting for data to be smoothed, and the continuity of business services and the data source switching efficiency are improved.
Drawings
The foregoing and other objects, features and advantages of the disclosure will be more apparent from the following description of embodiments of the disclosure with reference to the accompanying drawings, in which:
fig. 1 schematically illustrates an application scenario of a data processing method according to an embodiment of the present disclosure;
FIG. 2 schematically illustrates a flow chart of a data processing method according to an embodiment of the disclosure;
FIG. 3 schematically illustrates a system architecture of a data processing method according to an embodiment of the present disclosure;
FIG. 4 schematically illustrates a flow chart of a fourth data determination method according to a specific embodiment of the present disclosure;
FIG. 5 schematically illustrates a flow chart of a distributed corrosion protection data table generation method according to an embodiment of the present disclosure;
FIG. 6 schematically illustrates a block diagram of a data processing apparatus according to an embodiment of the present disclosure; and
Fig. 7 schematically illustrates a block diagram of an electronic device adapted for a data processing method according to an embodiment of the disclosure.
Detailed Description
Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is only exemplary and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the present disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. In addition, in the following description, descriptions of well-known structures and techniques are omitted so as not to unnecessarily obscure the concepts of the present disclosure.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and/or the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.
All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It should be noted that the terms used herein should be construed to have meanings consistent with the context of the present specification and should not be construed in an idealized or overly formal manner.
Where a convention analogous to "at least one of A, B and C, etc." is used, in general such a convention should be interpreted in accordance with the meaning of one of skill in the art having generally understood the convention (e.g., "a system having at least one of A, B and C" would include, but not be limited to, systems having a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).
In the technical solution of the present disclosure, the related user information (including, but not limited to, user personal information, user image information, user equipment information, such as location information, etc.) and data (including, but not limited to, data for analysis, stored data, displayed data, etc.) are information and data authorized by the user or sufficiently authorized by each party, and the related data is collected, stored, used, processed, transmitted, provided, disclosed, applied, etc. and processed, all in compliance with the related laws and regulations and standards of the related country and region, necessary security measures are taken, no prejudice to the public order, and corresponding operation entries are provided for the user to select authorization or rejection.
In the scenario of using personal information to make an automated decision, the method, the device and the system provided by the embodiment of the disclosure provide corresponding operation inlets for users, so that the users can choose to agree or reject the automated decision result; if the user selects refusal, the expert decision flow is entered. The expression "automated decision" here refers to an activity of automatically analyzing, assessing the behavioral habits, hobbies or economic, health, credit status of an individual, etc. by means of a computer program, and making a decision. The expression "expert decision" here refers to an activity of making a decision by a person who is specializing in a certain field of work, has specialized experience, knowledge and skills and reaches a certain level of expertise.
In the related art, during the system upgrading and optimizing period, the data source before upgrading and the data source after upgrading usually bear part of the service, so that when the data source after upgrading is abnormal, the data source before upgrading is timely switched to continue to provide services for the downstream application. In the scheme, the abnormal data source has the problems of high maintenance difficulty, long maintenance time and the like, so that the data source before upgrading can provide service for downstream applications in a continuous period of time. After the updated data source is successfully maintained, the data source of the downstream application needs to be switched from the data source before the update back to the updated data source. In order to ensure data consistency of the data source before and after the upgrade, the service data during the anomaly needs to be synchronized in the data source after the upgrade before switching back to the data source after the upgrade.
Therefore, in the scene that the abnormal data source is difficult to maintain and long in maintenance time, the data of the updated data source is longer in synchronization time and low in synchronization efficiency, and the service continuity of downstream application is affected.
To at least partially solve the above technical problems, embodiments of the present disclosure provide a data processing method, which is characterized in that the method includes: in response to detecting that the second data source is abnormal, sending a first data table of the first data source to a downstream application of the data lake, wherein the first data source is a centralized data source, and the second data source is a distributed data source; updating first data and second data in a first data table to a distributed anti-corrosion data table, wherein the first data is historical interaction data between a first data source and a downstream application, and the second data is newly added interaction data between the first data source and the downstream application during the abnormal period of the second data source; and in response to detecting that the second data source is restored to normal, sending the distributed corrosion protection data table to a downstream application of the data lake.
Fig. 1 schematically illustrates an application scenario of a data processing method according to an embodiment of the present disclosure.
As shown in fig. 1, an application scenario 100 according to this embodiment may include a first terminal device 101, a second terminal device 102, a third terminal device 103, a network 104, and a server 105. The network 104 is a medium used to provide a communication link between the first terminal device 101, the second terminal device 102, the third terminal device 103, and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
The user may interact with the server 105 through the network 104 using at least one of the first terminal device 101, the second terminal device 102, the third terminal device 103, to receive or send messages, etc. Various communication client applications, such as a shopping class application, a web browser application, a search class application, an instant messaging tool, a mailbox client, social platform software, etc. (by way of example only) may be installed on the first terminal device 101, the second terminal device 102, and the third terminal device 103.
For example, the downstream reference may refer to a communication client application installed in the first terminal device 101, the second terminal device 102, the third terminal device 103, such as a shopping class application, a web browser application, a search class application, an instant messaging tool, a mailbox client, social platform software, and the like.
The first terminal device 101, the second terminal device 102, the third terminal device 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablets, laptop and desktop computers, and the like.
The server 105 may be a server providing various services, such as a background management server (by way of example only) providing support for websites browsed by the user using the first terminal device 101, the second terminal device 102, and the third terminal device 103. The background management server may analyze and process the received data such as the user request, and feed back the processing result (e.g., the web page, information, or data obtained or generated according to the user request) to the terminal device.
For example, the server 105 is configured to send a first data table of a first data source to a downstream application of a data lake in response to detecting that an anomaly exists in a second data source; updating the first data and the second data in the first data table to a distributed anti-corrosion data table; and in response to detecting that the second data source is restored to normal, sending the distributed corrosion protection data table to a downstream application of the data lake.
The first data source and the second data source may refer to systems, platforms, software, etc. supported by servers other than server 105.
It should be noted that the data processing method provided in the embodiments of the present disclosure may be generally performed by the server 105. Accordingly, the data processing apparatus provided by the embodiments of the present disclosure may be generally provided in the server 105. The data processing method provided by the embodiments of the present disclosure may also be performed by a server or a server cluster that is different from the server 105 and is capable of communicating with the first terminal device 101, the second terminal device 102, the third terminal device 103, and/or the server 105. Accordingly, the data processing apparatus provided by the embodiments of the present disclosure may also be provided in a server or a server cluster that is different from the server 105 and is capable of communicating with the first terminal device 101, the second terminal device 102, the third terminal device 103, and/or the server 105.
It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
The data processing method of the disclosed embodiment will be described in detail below with reference to fig. 2 to 5 based on the scenario described in fig. 1.
Fig. 2 schematically illustrates a flow chart of a data processing method according to an embodiment of the present disclosure.
As shown in fig. 2, the method 200 includes operations S210 to S230.
In response to detecting that the second data source is anomalous, a first data table of the first data source is sent to a downstream application of the data lake in operation S210.
According to an embodiment of the present disclosure, the first data source is a centralized data source and the second data source is a distributed data source. Architecture, system upgrades are generally used to convert a centralized data source to a distributed data source, whereby a first data source may also be understood as a pre-upgrade data source and a second data source as an upgraded data source.
According to embodiments of the present disclosure, the first data source may be an upstream application, a data processing platform, or the like, for storing data to the data lake in the same. The downstream application may perform data processing or provide business services based on the data of the first data source in the data lake. The interaction of the second data source with the data lake and downstream applications is similar to the first data source and will not be described in detail herein.
Because the second data source is a data source after the system is upgraded, when the second data source is abnormal, the data source can be switched back to the first data source, and services can be provided for downstream applications based on the first data table of the first data source. However, since the first data source is the replaced data source, after the second data source is restored to normal, it is also necessary that the second data source continue to provide services to the downstream application.
According to an embodiment of the present disclosure, the first data source stores the generated data to the data lake in the form of a first data table.
As a specific embodiment, the first data table may be a patch source table. The source-attached table refers to a table directly generated after a data source is imported into a data lake, has fewer list values, can be obviously classified into a dimension table or a fact table, and mostly has foreign key dependence between the tables. The patch source table is also referred to as a low-level table.
In operation S220, the first data and the second data in the first data table are updated to the distributed corrosion protection data table.
According to an embodiment of the disclosure, the first data is historical interaction data between the first data source and the downstream application, and the second data is newly added interaction data between the first data source and the downstream application during an anomaly of the second data source.
According to embodiments of the present disclosure, the second data source may also have been anomalous before the second data source has been anomalous, whereby during the last anomaly there is historical interaction data between the first data source and the downstream application. Or before the abnormality, during the migration of the first data source and the second data source, the first data source and the second data source both bear partial services, and at this time, historical interaction data can also exist between the first data source and the downstream application. Thus, the historical interaction data between the first data source and the downstream application is referred to as first data.
According to embodiments of the present disclosure, during the occurrence of the present anomaly in the second data source, the first data source may continue to interoperate with the downstream application component, thereby generating newly added interaction data, which may be referred to as second data.
According to embodiments of the present disclosure, a distributed corrosion protection data table may be understood as a full data table in a data lake comprising data in a first data source and a second data source.
In response to detecting that the second data source is restored to normal, the distributed corrosion protection data table is sent to a downstream application of the data lake in operation S230.
According to the embodiment of the disclosure, when the first data source is switched to the second data source, the data consistency of the downstream application can be ensured through the distributed anti-corrosion data table.
In the embodiment of the disclosure, since the distributed anti-corrosion data table synchronously records the first data and the second data between the first data source and the downstream application during the abnormal period of the second data source, after the second data source is recovered to be normal, by sending the distributed anti-corrosion data table to the downstream application of the data lake, services can be provided for the downstream application directly based on the second data source, and the second data table of the second data source does not need to wait for the data smoothing operation to be performed. Therefore, the embodiment of the disclosure can switch back to the second data source in real time after the exception processing is completed, so that the technical problems of long data source switching time and low efficiency caused by overlong exception processing time are avoided, services can be continuously provided without waiting for data to be smoothed, and the continuity of business services and the data source switching efficiency are improved.
According to an embodiment of the present disclosure, third data during normal periods of the second data source is also stored in the distributed corrosion protection data table, wherein the third data is obtained from the second data table of the second data source. Updating the first data and the second data in the first data table to the distributed corrosion protection data table comprises: determining fourth data to be stored to the distributed corrosion prevention data table based on the first data and the third data; and storing the second data and the fourth data to a distributed corrosion protection data table.
According to an embodiment of the present disclosure, when the second data source normally provides a service to a downstream application, the data of the second data source is first stored in the data lake in the form of a second data table. And then, the data lake stores the data in the second data table into the distributed anti-corrosion data table, and the downstream application is provided with service through the distributed anti-corrosion data table.
According to embodiments of the present disclosure, since the third data table is already stored in the distributed corrosion protection data table, there may be duplicate data between the first data and the third data. Therefore, when storing the first data in the first data table into the distributed corrosion prevention data table, it is necessary to store data, which is not repeated with the third data, in the first data into the distributed corrosion prevention data table, that is, the fourth data.
In addition, since the second data is newly added interactive data, the third data is difficult to have the same data as the second data, so the second data can be directly stored in the distributed anti-corrosion data table.
Fig. 3 schematically illustrates a system architecture of a data processing method according to an embodiment of the present disclosure.
As shown in fig. 3, the data processing method 300 includes a first data source 301, a second data source 302, a data lake, and a downstream application 307.
The first data source 301 and the second data source 302 may store target data 303 in a data lake. The target data 303 may be a file, system data, such as (Distribution Resource Planning) system data, abbreviated DRP data.
The data of the first data source 301 is stored in the data lake in the form of a first data table 304 and the data of the second data source 302 is stored in the data lake in the form of a second data table 305. The data of the first data table 304 and the second data table 305 may be stored in the distributed corrosion protection data table 306, and during the storing process, the data of the first data table 304 and the second data table 305 may also be checked.
In the case where the data lake provides data support to the downstream application 307 via the first data source only, the data in the first data table may be sent to the downstream application 307. In the case where the data lake provides data support to the downstream application 307 via the second data source, the data in the distributed corrosion protection data table 306 is sent to the downstream application 307.
According to an embodiment of the present disclosure, determining fourth data to be stored to the distributed corrosion protection data table based on the first data and the third data includes: determining first sub-data having a target data type from the first data, wherein the target data type includes at least one of: the account identification type, the product type and the bank card type, and the third data does not comprise the first sub-data; the first sub data is taken as fourth data.
According to an embodiment of the present disclosure, the target data type is a data type unique to the first data source, and thus, the first sub data having the target data type is also included only in the first data, not in the third data.
For example, the key value of the account identification type may be an account identification, and the value is an account identification of the user, such as "xxxxx123". Or the account identification type may also refer to the collaboration platform of the second data source, the account identification of the user. The product type can be financial products of a certain type such as financial products, fund products, borrowing products and the like corresponding to banks; for a certain class of financial products, the product type may also be a specific product name. The bank card type may be a card type of which the user opens an account, such as a deposit card, a credit card, or the like.
In embodiments of the present disclosure, the first data source and the second data source may obtain consent or authorization of the user prior to obtaining the information of the user. For example, a request to obtain user information may be issued to a user before the user's information is obtained. In case the user agrees or authorizes that the user information can be obtained, the information is obtained and stored.
Because the first sub data is unique data in the first data, the first data cannot be compared with the third data, the first sub data can be determined from the first data directly according to the target data type, the operation is simple and convenient, and the screening efficiency of the data stored in the distributed anti-corrosion data table can be improved.
According to an embodiment of the present disclosure, the data processing method further includes: determining second sub-data of a target main key from the first data and determining third sub-data of the target main key from the third data, wherein the target main key represents the main key included in both the first data and the third data; and in the case where the second sub data and the third sub data are different, taking the second sub data as fourth data.
According to an embodiment of the present disclosure, there may be data for the same primary key in the first data and the third data. Such as a date, time of operation, operational status, balance, etc.
In the case where the first data and the third data both comprise data for the same primary key, if the second sub-data for the primary key for the first data is different from the third sub-data for the primary key for the third data, then a difference between the first data or the third data is characterized, possibly due to an anomaly in the data source. At this time, it is necessary to select appropriate data from the first data or the third data as fourth data, and store it in the distributed corrosion protection data table later.
For example, for a date field of a certain data table, the first data includes data from 1 day to 15 days, the third data includes data from 15 days to 30 days, and at this time, the first data and the third data each include data of 15 days.
Since the second data source is abnormal at this time, it is also possible that an abnormality occurs in the third sub data in the third data in consideration that the third data is generated by the second data source. Therefore, when the second sub data and the third sub data are different, the second sub data for the target main key is considered as standard data, and is stored as fourth data in the distributed corrosion protection data table.
According to an embodiment of the present disclosure, the data processing method further includes: comparing the first data with the third data to obtain a difference result between the first data and the third data, wherein the difference result comprises fourth sub-data only contained in the first data; and taking the fourth sub-data in the first data as fourth data.
According to an embodiment of the present disclosure, the first data includes fourth sub data in addition to the first sub data unique thereto, the second sub data having the same main key as the third data and different from the first sub data.
The fourth sub-data, which should originally exist in both the first data and the third data but only in the first data, may exist in the first data due to problems such as a data structure and exception handling of the first data source. The fourth sub-data may be embodied by a difference result obtained by comparing the first data and the second data.
According to the embodiment of the present disclosure, since the fourth sub data is included only in the first data, the fourth sub data may be directly taken as the fourth data and directly stored to the distributed corrosion prevention data table.
According to an embodiment of the present disclosure, there may be fifth sub-data in the third data that should be originally present in both the first data and the third data, but only in the third data. The fifth sub-data may be embodied by a difference result obtained by comparing the first data and the second data. I.e. the difference result may comprise a fourth sub-data and a fifth sub-data. Since the second data source is now anomalous, the fifth sub-data may be ignored and not stored in the distributed corrosion protection data table.
According to the embodiment of the disclosure, since the direct comparison of the first data and the third data may include the first sub data unique to the first data, the target data type may not be compared by the pre-written rule in the process of comparing the first data and the third data, thereby ensuring that the difference result does not include the first sub data.
Fig. 4 schematically illustrates a flow chart of a fourth data determination method according to a specific embodiment of the present disclosure.
As shown in fig. 4, the fourth data determination method 400 includes a first data table 401 of a first data source and a second data table 402 of a second data source. The first data table 401 includes first data 404 and second data 403, and the second data source 402 includes third data 405.
For the second data 403 newly added during the second data source abnormality, the second data 403 may be directly used as the fourth data 411.
The first data 404 includes fourth sub-data 406, first sub-data 407 of a target data type unique to the first data source, and second sub-data 408 of the same target primary key as the second data source. The second data 405 includes third sub-data 409 having the same target primary key as the first data source and fifth sub-data 410 in the difference result.
For the first sub data 407 and the fourth sub data 406, they may be directly taken as the fourth data 411.
Regarding the second sub data 408, in the case where the second sub data 408 and the third sub data 409 are the same, one of the second sub data 408 or the third sub data 409 is taken as fourth data 411; in the case where the second sub data 408 and the third sub data 409 are different, the second sub data 408 is taken as the fourth data 411.
For the fifth sub data 410, the second data source may be directly ignored because of the anomaly.
Thus, the fourth data 411 may include at least one of: first sub-data 407, second sub-data 408, and fourth sub-data 406. The fourth data 411 may also include third sub-data 409.
According to the embodiment of the disclosure, during the actual application tangential flow period, before the platform anti-corrosion table is written, the accuracy of data in the distributed anti-corrosion data table can be ensured by comparing the first data of the first data source with the third data of the second data source. In addition, due to the fact that data anti-corrosion processing is added, during the abnormal period of the second data source, data of the distributed anti-corrosion data table are based on the first data source, and therefore accuracy of data acquisition of downstream applications is guaranteed.
In accordance with an embodiment of the present disclosure, in response to detecting that the second data source is restored to normal, sending the distributed corrosion protection data table to the downstream application includes: and in response to detecting that the second data source is restored to be normal, modifying the data source parameters of the data lake into the second data source in real time so as to send the distributed anti-corrosion data table to a downstream application in real time.
According to an embodiment of the present disclosure, data source parameters are included in a data lake for controlling the data lake to provide data from a first data source or data from a second data source to a downstream application.
According to an embodiment of the present disclosure, modifying data source parameters of a data lake in real time to a second data source includes: and modifying the parameter value of the data source parameter from the parameter value representing the first data source to the parameter value representing the second data source. For example, 1 represents a first data source, 0 represents a second data source, and the data source parameters of the data lake are modified to the second data source in real time, that is, the parameter values of the data source parameters are modified from "1" to "0".
In the embodiment of the disclosure, since the data providing the second data source to the downstream application is realized through the distributed anti-corrosion data table, and the distributed anti-corrosion data table synchronizes the first data and the second data of the first data source in real time, when the data source of the data lake is switched back to the second data source, the data source switching can be completed in real time while the data consistency is ensured without waiting for the data topdressing operation, and the data source switching efficiency is improved.
According to an embodiment of the present disclosure, the first data table and the distributed corrosion protection data table are different in data structure, and the second data table and the distributed corrosion protection data table are the same in data structure. Storing the second data and the fourth data to a distributed corrosion protection data table includes: converting the data structures of the second data and the fourth data into the data structures of the distributed anti-corrosion data table based on a predetermined data structure conversion rule; and storing the second data and the fourth data after the data structure is converted into a distributed anti-corrosion data table.
According to an embodiment of the present disclosure, the second data table of the second data source is used as an intermediate table, no data is provided to the downstream application, and after the third data in the second data table is stored in the distributed anti-corrosion data table, the distributed anti-corrosion data table provides service to the downstream application. Thus, the data structures of the second data table and the distributed corrosion protection data table are identical, and the table structure does not need to be modified during the storage of the third data.
However, the first data source and the second data source are data sources before and after the upgrade, and the table structures of the first data source and the second data source are different. For example, the second data table may have fields added, modified, combined, or deleted as compared to the first data table. Therefore, when the second data and the fourth data in the first data table are stored in the distributed corrosion protection data table, the data structure needs to be converted.
According to an embodiment of the present disclosure, the data structure conversion rule is code text written in advance based on characteristics of the first data source and the second data source. The fields of the second data and the fourth data are traversed based on the code text to convert the data structures of the second data and the fourth data to the data structures of the distributed corrosion protection data table.
According to the embodiment of the disclosure, in a data source normal switching stage before the second data source is abnormal, that is, the system is not completely upgraded, the first data source and the second data source can be respectively used as data sources of partial application in the downstream application. At this time, determining which downstream applications are applicable to the first data source and which downstream applications are applicable to the second data source not only brings greater workload, but also increases the difficulty of service processing. Therefore, the data in the first data table and the second data table can be checked, the data in the first data table and the data in the second data table are combined to generate the distributed anti-corrosion data table after being checked, and the data are uniformly provided for downstream applications by the distributed anti-corrosion data table, so that it is not necessary to determine which downstream applications are applicable to the first data source and which downstream applications are applicable to the second data source.
Fig. 5 schematically illustrates a flowchart of a distributed corrosion protection data table generation method according to an embodiment of the present disclosure.
As shown in FIG. 5, a distributed corrosion protection data table generation method 500 illustrates a process for generating a distributed corrosion protection data table 504 from a first data source and a second data source.
The first data table 501 may be understood as a table directly generated after the first data source imports the data lake, and the temporary data table 502 may be understood as a table directly generated after the second data source imports the data lake. For the first data table 501, the data format of the centralized data source is generally uniform, so that the operations of data sorting, screening and the like in the data lake can be omitted. For temporary data table 502, since the distributed data source generally involves multiple servers and multiple data tables, the data of the directly imported temporary data table 502 may be more scattered, and thus the data of the second data source is consolidated by writing the data in temporary data table 502 into second data table 503.
The process of merging the first data table 501 with the second data table 503 refers to a process of determining fourth data to be stored to the distributed corrosion protection data table based on the first data and the third data. At this time, the third data of the distributed corrosion protection data table may be regarded as the data in the second data table 503.
The embodiment of the disclosure can switch back to the second data source in real time after the exception processing is completed, so that the technical problems of long data source switching time and low efficiency caused by overlong exception processing time are avoided, service can be continuously provided without waiting until the data is smoothed, and the continuity of business service and the data source switching efficiency are improved.
Fig. 6 schematically shows a block diagram of a data processing apparatus according to an embodiment of the present disclosure.
As shown in fig. 6, the data processing apparatus 600 of this embodiment includes a first transmitting module 610, an updating module 620, and a second transmitting module 630.
The first sending module 610 is configured to send, in response to detecting that the second data source has an anomaly, a first data table of the first data source to a downstream application of the data lake, where the first data source is a centralized data source and the second data source is a distributed data source. In an embodiment, the first sending module 610 may be configured to perform the operation S210 described above, which is not described herein.
The updating module 620 is configured to update first data and second data in the first data table to the distributed corrosion protection data table, where the first data is historical interaction data between the first data source and the downstream application, and the second data is newly added interaction data between the first data source and the downstream application during an abnormal period of the second data source. In an embodiment, the updating module 620 may be configured to perform the operation S220 described above, which is not described herein.
And the second sending module 630 is configured to send the distributed corrosion protection data table to a downstream application of the data lake in response to detecting that the second data source is restored to normal. In an embodiment, the second sending module 630 may be configured to perform the operation S230 described above, which is not described herein.
According to an embodiment of the present disclosure, third data during normal periods of the second data source is also stored in the distributed corrosion protection data table, wherein the third data is obtained from the second data table of the second data source. The update module 620 includes a determination sub-module and a storage sub-module.
And the determining submodule is used for determining fourth data to be stored in the distributed corrosion prevention data table based on the first data and the third data.
And the storage sub-module is used for storing the second data and the fourth data into the distributed anti-corrosion data table.
According to an embodiment of the present disclosure, the determination submodule includes a first determination unit and a second determination unit.
A first determining unit, configured to determine first sub-data having a target data type from the first data, where the target data type includes at least one of: the account identification type, the product type and the bank card type, and the third data does not comprise the first sub-data.
And the second determining unit is used for taking the first sub data as fourth data.
According to an embodiment of the present disclosure, the determination sub-module further comprises a third determination unit and a fourth determination unit.
And the third determining unit is used for determining second sub-data of the target main key from the first data and determining third sub-data of the target main key from the third data, wherein the target main key represents the main key included in both the first data and the third data.
And a fourth determining unit configured to take the second sub data as fourth data in a case where the second sub data and the third sub data are different.
According to an embodiment of the present disclosure, the determination submodule further includes a fifth determination unit and a sixth determination unit.
And a fifth determining unit for comparing the first data and the third data to obtain a difference result between the first data and the third data, wherein the difference result includes fourth sub-data only included in the first data.
And a sixth determining unit configured to take fourth sub-data in the first data as fourth data.
According to an embodiment of the present disclosure, the second transmitting module 630 includes a transmitting sub-module for modifying the data source parameters of the data lake to the second data source in real time in response to detecting that the second data source is restored to be normal, so as to transmit the distributed corrosion protection data table to the downstream application in real time.
According to an embodiment of the present disclosure, the first data table and the distributed corrosion protection data table are different in data structure, and the second data table and the distributed corrosion protection data table are the same in data structure. The storage submodule comprises a conversion unit and a storage unit.
And the conversion unit is used for converting the data structures of the second data and the fourth data into the data structures of the distributed anti-corrosion data table based on a preset data structure conversion rule.
And the storage unit is used for storing the second data and the fourth data after the data structure is converted into the distributed anti-corrosion data table.
Any of the first transmitting module 610, the updating module 620, and the second transmitting module 630 may be combined in one module to be implemented, or any of the modules may be split into a plurality of modules according to an embodiment of the present disclosure. Or at least some of the functionality of one or more of the modules may be combined with, and implemented in, at least some of the functionality of other modules.
According to embodiments of the present disclosure, at least one of the first transmitting module 610, the updating module 620, and the second transmitting module 630 may be implemented at least in part as hardware circuitry, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented in hardware or firmware in any other reasonable way of integrating or packaging circuitry, or in any one of or a suitable combination of any of the three. Or at least one of the first transmission module 610, the update module 620 and the second transmission module 630 may be at least partially implemented as a computer program module which, when executed, may perform the corresponding functions.
Fig. 7 schematically illustrates a block diagram of an electronic device adapted for a data processing method according to an embodiment of the disclosure.
As shown in fig. 7, an electronic device 700 according to an embodiment of the present disclosure includes a processor 701 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 702 or a program loaded from a storage section 708 into a Random Access Memory (RAM) 703. The processor 701 may include, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or an associated chipset and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), or the like. The processor 701 may also include on-board memory for caching purposes. The processor 701 may comprise a single processing unit or a plurality of processing units for performing different actions of the method flows according to embodiments of the disclosure.
In the RAM703, various programs and data necessary for the operation of the electronic apparatus 700 are stored. The processor 701, the ROM702, and the RAM703 are connected to each other through a bus 704. The processor 701 performs various operations of the method flow according to the embodiments of the present disclosure by executing programs in the ROM702 and/or the RAM 703. Note that the program may be stored in one or more memories other than the ROM702 and the RAM 703. The processor 701 may also perform various operations of the method flow according to embodiments of the present disclosure by executing programs stored in one or more memories.
According to an embodiment of the present disclosure, the electronic device 700 may further include an input/output (I/O) interface 705, the input/output (I/O) interface 705 also being connected to the bus 704. The electronic device 700 may also include one or more of the following components connected to the input/output I/O interface 705: an input section 706 including a keyboard, a mouse, and the like; an output portion 707 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, a speaker, and the like; a storage section 708 including a hard disk or the like; and a communication section 709 including a network interface card such as a LAN card, a modem, or the like. The communication section 709 performs communication processing via a network such as the internet. The drive 710 is also connected to the I/O interface 705 as needed. A removable medium 711 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 710 as necessary, so that a computer program read therefrom is mounted into the storage section 708 as necessary.
The present disclosure also provides a computer-readable storage medium that may be embodied in the apparatus/device/system described in the above embodiments; or may exist alone without being assembled into the apparatus/device/system. The computer-readable storage medium carries one or more programs which, when executed, implement methods in accordance with embodiments of the present disclosure.
According to embodiments of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example, but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. For example, according to embodiments of the present disclosure, the computer-readable storage medium may include ROM 702 and/or RAM 703 and/or one or more memories other than ROM 702 and RAM 703 described above.
Embodiments of the present disclosure also include a computer program product comprising a computer program containing program code for performing the methods shown in the flowcharts. The program code, when executed in a computer system, causes the computer system to perform the methods provided by embodiments of the present disclosure.
The above-described functions defined in the system/apparatus of the embodiments of the present disclosure are performed when the computer program is executed by the processor 701. The systems, apparatus, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the disclosure.
In one embodiment, the computer program may be based on a tangible storage medium such as an optical storage device, a magnetic storage device, or the like. In another embodiment, the computer program may also be transmitted, distributed over a network medium in the form of signals, downloaded and installed via the communication section 709, and/or installed from the removable medium 711. The computer program may include program code that may be transmitted using any appropriate network medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.
In such an embodiment, the computer program may be downloaded and installed from a network via the communication portion 709, and/or installed from the removable medium 711. The above-described functions defined in the system of the embodiments of the present disclosure are performed when the computer program is executed by the processor 701. The systems, devices, apparatus, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the disclosure.
According to embodiments of the present disclosure, program code for performing computer programs provided by embodiments of the present disclosure may be written in any combination of one or more programming languages, and in particular, such computer programs may be implemented in high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. Programming languages include, but are not limited to, such as Java, c++, python, "C" or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Those skilled in the art will appreciate that the features recited in the various embodiments of the disclosure and/or in the claims may be provided in a variety of combinations and/or combinations, even if such combinations or combinations are not explicitly recited in the disclosure. In particular, the features recited in the various embodiments of the present disclosure and/or the claims may be variously combined and/or combined without departing from the spirit and teachings of the present disclosure. All such combinations and/or combinations fall within the scope of the present disclosure.
While the foregoing is directed to embodiments of the present disclosure, other and further details of the invention may be had by the present application, it is to be understood that the foregoing description is merely exemplary of the present disclosure and that no limitations are intended to the scope of the disclosure, except insofar as modifications, equivalents, improvements or modifications may be made without departing from the spirit and principles of the present disclosure.

Claims (11)

1. A method of data processing, the method comprising:
in response to detecting that the second data source is abnormal, sending a first data table of a first data source to a downstream application of the data lake, wherein the first data source is a centralized data source and the second data source is a distributed data source;
Updating first data and second data in the first data table to a distributed corrosion-resistant data table, wherein the first data is historical interaction data between the first data source and the downstream application, and the second data is newly added interaction data between the first data source and the downstream application during the abnormal period of the second data source; and
In response to detecting that the second data source is restored to normal, the distributed corrosion protection data table is sent to a downstream application of the data lake.
2. The method of claim 1, wherein the distributed corrosion protection data table further stores third data during normal times of the second data source, wherein the third data is obtained from the second data table of the second data source;
The updating the first data and the second data in the first data table to the distributed anti-corrosion data table comprises:
Determining fourth data to be stored to the distributed corrosion protection data table based on the first data and the third data; and
And storing the second data and the fourth data to the distributed corrosion protection data table.
3. The method of claim 2, wherein the determining fourth data to be stored to the distributed corrosion protection data table based on the first data and the third data comprises:
determining first sub-data with a target data type from the first data, wherein the target data type comprises at least one of the following: the account identification type, the product type and the bank card type, and the third data does not comprise the first sub-data;
And taking the first sub data as the fourth data.
4. The method as recited in claim 2, further comprising:
determining second sub-data of a target main key from the first data and determining third sub-data of the target main key from the third data, wherein the target main key characterizes main keys included in the first data and the third data; and
And taking the second sub data as the fourth data in the case that the second sub data and the third sub data are different.
5. The method according to any one of claims 2 to 4, further comprising:
Comparing the first data with the third data to obtain a difference result between the first data and the third data, wherein the difference result comprises fourth sub-data only contained in the first data; and
And taking fourth sub-data in the first data as the fourth data.
6. The method of claim 1, wherein the sending the distributed corrosion protection data table to a downstream application in response to detecting that the second data source is restored comprises:
And in response to detecting that the second data source is recovered to be normal, modifying the data source parameters of the data lake into the second data source in real time so as to send the distributed corrosion protection data table to a downstream application in real time.
7. The method of claim 2, wherein the data structures of the first data table and the distributed corrosion protection data table are different, and the data structures of the second data table and the distributed corrosion protection data table are the same;
The storing the second data and the fourth data to the distributed corrosion protection data table includes:
Converting the data structures of the second data and the fourth data into the data structures of the distributed corrosion protection data table based on a predetermined data structure conversion rule; and
And storing the second data and the fourth data after the data structure is converted into the distributed anti-corrosion data table.
8. A data processing apparatus, the apparatus comprising:
The first sending module is used for responding to the detection that the second data source has abnormality, and sending a first data table of the first data source to downstream application of the data lake, wherein the first data source is a centralized data source, and the second data source is a distributed data source;
The updating module is used for updating first data and second data in the first data table to a distributed anti-corrosion data table, wherein the first data is historical interaction data between the first data source and the downstream application, and the second data is newly added interaction data between the first data source and the downstream application during the abnormal period of the second data source; and
And the second sending module is used for responding to the detection that the second data source is recovered to be normal and sending the distributed anti-corrosion data table to a downstream application.
9. An electronic device, comprising:
one or more processors;
A memory for storing one or more computer programs,
Characterized in that the one or more processors execute the one or more computer programs to implement the steps of the method according to any one of claims 1 to 7.
10. A computer-readable storage medium, on which a computer program or instructions is stored, characterized in that the computer program or instructions, when executed by a processor, implement the steps of the method according to any one of claims 1-7.
11. A computer program product comprising a computer program or instructions which, when executed by a processor, implement the steps of the method according to any one of claims 1 to 7.
CN202410219049.XA 2024-02-28 2024-02-28 Data processing method, device, electronic equipment and computer storage medium Pending CN118093698A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410219049.XA CN118093698A (en) 2024-02-28 2024-02-28 Data processing method, device, electronic equipment and computer storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410219049.XA CN118093698A (en) 2024-02-28 2024-02-28 Data processing method, device, electronic equipment and computer storage medium

Publications (1)

Publication Number Publication Date
CN118093698A true CN118093698A (en) 2024-05-28

Family

ID=91143485

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410219049.XA Pending CN118093698A (en) 2024-02-28 2024-02-28 Data processing method, device, electronic equipment and computer storage medium

Country Status (1)

Country Link
CN (1) CN118093698A (en)

Similar Documents

Publication Publication Date Title
CN113344523A (en) Data processing method and device, electronic equipment and computer readable storage medium
CN112947919A (en) Method and device for constructing service model and processing service request
CN114358147A (en) Training method, identification method, device and equipment of abnormal account identification model
CN113191889A (en) Wind control configuration method, configuration system, electronic device and readable storage medium
CN115760013A (en) Operation and maintenance model construction method and device, electronic equipment and storage medium
CN112887355A (en) Service processing method and device for abnormal server
LU501006B1 (en) Atmospheric environment quality management method, electronic equipment and storage system
CN111967806B (en) User risk updating method and device based on periodic retrace and electronic equipment
CN118093698A (en) Data processing method, device, electronic equipment and computer storage medium
CN113918525A (en) Data exchange scheduling method, system, electronic device, medium, and program product
CN111178823B (en) Method and device for canceling residence related transaction
CN114386951A (en) Process approval method and device, electronic equipment and storage medium
CN113487224A (en) Content processing method, apparatus, device, medium, and program product
CN112231115A (en) Method and device for executing dynamic insert operator
CN118227439A (en) Method, device, equipment, medium and program product for processing log data
CN118170730A (en) File processing method, apparatus, device, storage medium, and program product
CN118093525A (en) File processing method, apparatus, device, medium and program product
CN114461527A (en) Test item management method, test item management apparatus, test item management device, storage medium, and program product
CN117176576A (en) Network resource changing method, device, equipment and storage medium
CN116560763A (en) Service processing method, device, equipment and storage medium
CN112596777A (en) Method, device, equipment and computer readable medium for processing data
CN118071524A (en) Product processing method, apparatus, device, storage medium, and program product
CN116070891A (en) Flow processing method, device, equipment and storage medium
CN116880926A (en) Information processing method, device, equipment and storage medium
CN115729567A (en) Automatic deployment method and device of operation and maintenance product, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination