CN112749167A - Method and device for determining broken link data and nonvolatile storage medium - Google Patents

Method and device for determining broken link data and nonvolatile storage medium Download PDF

Info

Publication number
CN112749167A
CN112749167A CN202110064688.XA CN202110064688A CN112749167A CN 112749167 A CN112749167 A CN 112749167A CN 202110064688 A CN202110064688 A CN 202110064688A CN 112749167 A CN112749167 A CN 112749167A
Authority
CN
China
Prior art keywords
data
service
key
platform
business
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110064688.XA
Other languages
Chinese (zh)
Inventor
李冉
陈震宇
刘国华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Postal Savings Bank of China Ltd
Original Assignee
Postal Savings Bank of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Postal Savings Bank of China Ltd filed Critical Postal Savings Bank of China Ltd
Priority to CN202110064688.XA priority Critical patent/CN112749167A/en
Publication of CN112749167A publication Critical patent/CN112749167A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2477Temporal data queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor

Abstract

The invention discloses a method and a device for determining broken link data and a nonvolatile storage medium. Wherein, the method comprises the following steps: obtaining a data linked list in a big data platform, wherein the data linked list comprises: table home key and closed link date; respectively taking the table main key and the closed chain date as a row main key and a key value of the distributed open source database; when the big data platform is detected to receive service data, the service data and the stored data in the distributed open source database are compared based on the row main key and the key value to determine whether the service data are broken link data. The invention solves the technical problems that in the prior art, the scheme for inquiring the broken link data is grouped according to the main key and sorted according to the open link date, a large amount of computer memory resources are consumed, the running time is long, and the calculation efficiency is influenced.

Description

Method and device for determining broken link data and nonvolatile storage medium
Technical Field
The invention relates to the field of data processing, in particular to a method and a device for determining broken link data and a nonvolatile storage medium.
Background
In the prior art, when the operation of a big data platform for storing source system data is executed, the problem of how to find the broken link of the service data is solved, and the current processing mode is as follows: writing HQL sentences by a Hive bin counting tool to group the linked lists by main keys and sequencing according to open-link dates; storing the result of the last step of grouping and sequencing into a temporary table to reduce the memory overhead; and self-associating the temporary table, inquiring the data which has the same main key but has the closed link date of the last service data not equal to the open link date of the data, and if the data exists, explaining that the service data has the broken link.
However, the above current scheme for querying the broken link data has the following disadvantages: because the data size in many tables is large, grouping according to the primary key and sequencing according to the open-chain date consume a large amount of computer memory resources, and the running time is long, which affects the calculation efficiency.
In view of the above problems, no effective solution has been proposed.
Disclosure of Invention
The embodiment of the invention provides a method and a device for determining broken link data and a nonvolatile storage medium, which are used for at least solving the technical problems that in the prior art, a scheme for inquiring broken link data is grouped according to a main key and is sorted according to an open link date, a large amount of computer memory resources are consumed, the running time is long, and the computing efficiency is influenced.
According to an aspect of the embodiments of the present invention, there is provided a method for determining broken link data, including: obtaining a data linked list in a big data platform, wherein the data linked list comprises: table home key and closed link date; respectively taking the table main key and the closed chain date as a row main key and a key value of the distributed open source database; when the big data platform is detected to receive service data, the service data and the stored data in the distributed open source database are compared based on the row main key and the key value to determine whether the service data are broken link data.
Optionally, the service data is sent to the big data platform at least in the following manner: and acquiring all service data generated in the daily service process through a service source system, and sending the service data to the big data platform through a service bus.
Optionally, the method further includes: after the big data platform receives the service data, the service data is analyzed to obtain an analysis result, and whether the service data is completely transmitted or not is verified based on the analysis result, wherein at least the field name, the field number and the field type of the service data are verified.
Optionally, comparing the service data with the stored data in the distributed open source database based on the row main key and the key value to determine whether the service data is broken link data, including: if the business data does not exist in the big data platform, comparing the business data with the stored data based on the row main key and the key value to determine whether the stored data contains the business data; if the stored data does not contain the business data, performing open-chain operation on the big data platform to newly add the business data, and recording the latest effective state of the business data; and if the stored data contains the service data, determining the service data as the broken link data.
Optionally, comparing the service data with the stored data in the distributed open source database based on the row main key and the key value to determine whether the service data is broken link data, including: if the business data exists in the big data platform, comparing the business data with the stored data based on the row main key and the key value to determine whether the stored data contains the business data; if the storage data contains the business data, field analysis is carried out on the business data, if the field of the business data is different from the field of the existing business data, closed-chain operation is carried out on the big data platform, re-open-chain operation is carried out on the big data platform to newly add the business data, and the latest effective state of the business data is recorded.
Optionally, the method further includes: and if the service data exists in the big data platform and the service data does not exist in the service source system, executing closed-chain operation on the big data platform and marking the service data as an invalid state.
According to another aspect of the embodiments of the present invention, there is also provided an apparatus for determining broken link data, including: the acquisition module is used for acquiring a data linked list in a big data platform, and the data linked list comprises: table home key and closed link date; the processing module is used for respectively taking the table main key and the closed chain date as a row main key and a key value of the distributed open source database; and the determining module is used for comparing the service data with the stored data in the distributed open source database based on the row main key and the key value when detecting that the big data platform receives the service data so as to determine whether the service data is broken link data.
According to another aspect of the embodiments of the present invention, there is also provided a non-volatile storage medium storing a plurality of instructions, the instructions being adapted to be loaded by a processor and to perform any one of the above methods of determining broken link data.
According to another aspect of the embodiments of the present invention, there is also provided a processor, configured to execute a program, where the program is configured to execute any one of the above methods for determining broken link data when running.
According to another aspect of the embodiments of the present invention, there is also provided an electronic device, including a memory and a processor, where the memory stores a computer program, and the processor is configured to execute the computer program to perform any one of the above methods for determining broken link data.
In the embodiment of the present invention, by obtaining a data linked list in a big data platform, the data linked list includes: table home key and closed link date; respectively taking the table main key and the closed chain date as a row main key and a key value of the distributed open source database; when the big data platform is detected to receive the service data, the service data and the stored data in the distributed open source database are compared based on the row main key and the key value to determine whether the service data are broken link data, the purposes of saving computer memory resources and shortening the running time are achieved, the technical effect of improving the calculation efficiency of determining the broken link data is achieved, and the technical problems that in the prior art, the scheme for inquiring the broken link data is grouped according to the main key and sequenced according to the open link date, a large amount of computer memory resources are consumed, the running time is long, and the calculation efficiency is affected are solved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:
FIG. 1 is a flow diagram of a method of determining broken link data according to an embodiment of the invention;
FIG. 2 is a flow diagram of an alternative method of determining broken link data according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of an apparatus for determining broken link data according to an embodiment of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
First, in order to facilitate understanding of the embodiments of the present invention, some terms or nouns referred to in the present invention will be explained as follows:
hadoop: a distributed system infrastructure, users can develop distributed programs under the condition of not knowing distributed bottom level details, and high-speed operation and storage are carried out by fully utilizing the power of a cluster.
Hive: a data warehouse tool based on Hadoop is used for data extraction, conversion and loading, and is a mechanism capable of storing, inquiring and analyzing large-scale data stored in Hadoop.
HQL: SQL statements running on the Hive data warehouse.
HBase: a distributed and column-oriented open source database based on Hadoop is different from a general relational database, is a database suitable for unstructured data storage, and has the advantages of high performance and capability of realizing inquiry on a data millisecond level.
A pull chain table: the data mode in the data warehouse is a table for maintaining historical state and latest state data, the zipper table is actually equivalent to a snapshot according to different zipper granularities, only optimization is performed, a part of unchangeable probability is removed, and the customer records of zipper time points can be conveniently restored through the zipper list, so that the data warehouse has the advantages that the historical state of the reaction data can be met, and the storage can be saved to the greatest extent; by adding two fields of start _ date and end _ date on the basis of the original table, the required data is selected through the date field, and the open-link date of the next piece of data is the same as the closed-link date of the previous piece of data.
And (3) chain breaking of the zipper surface: the broken link is a phenomenon in the pull-linked list, and refers to a phenomenon that the historical state of the data is missing and discontinuous.
The reason for chain scission is as follows: in the process of sending the full data, the source system cannot send a certain piece of data to the big data platform due to business change or technical reasons, so that the big data platform does closed-chain operation for the reason that the data is invalid; however, the source system subsequently retransmits the piece of data until the data is re-linked, which results in data chain breaking, that is, the data is missing in a certain period of time.
Before understanding the scheme of the present application, first of all, the definitions of a pull-down chain table and meanings of chain scission are to be understood, and a data processing flow of a current big data platform for a service source system is to be understood, in the following embodiments of the present application, Hadoop is taken as a big data platform architecture, and Hive is taken as a data bin as an example:
pull chain open chain data: now agree to data with '21001231' as the closed-chain date, i.e., the currently valid data.
Pull chain closed chain data: the closed chain date is data of a certain previous historical date.
Pull chain closed chain operation: the date of the closed chain of the data is changed from '21001231' to the date of the day, which indicates that the state of the piece of data is over the day.
And (3) unlocking the pull chain: the new data is added, the open-link date of the data is the date of the day, and the closed-link date of the data is '21001231', which represents the latest state of the data.
By taking a customer account as a unique identifier and recording the change track of the pull-down linked list according to the change of the balance of the account, the following is taken as an example:
the client opens an account in 2020, 2 months and 10 days, and stores the account in 3600 yuan; at 27 days 2/2020, a transaction is made and the account balance becomes 3800 dollars, and there is no change thereafter; when the transaction is generated in 3 months and 3 days in 2020, the account balance becomes 4000 yuan, and there is no change after that; at 18 months 3 and 2020, a transaction is made and the account balance changes to 4800 yuan, which has not changed thereafter.
Correct data in the linked list:
Figure BDA0002903694100000051
as can be seen from the above data, the above data records the value of the balance of the customer account in a certain period of time and the change condition of the balance in the whole time range in detail;
1. the 10 th of 2.2020 is the date of the client's account opening, and is also the date of the generation of the data of serial number 1, that is, its open chain date 20200210; the closed chain date in the serial number 1 is the same as the open chain date of the serial number 2, because the transaction is generated in the No. 2-month-27 in 2020, the account balance becomes 3800 yuan, so the serial number 1 is subjected to the closed chain operation, and the closed chain date is 20200227; the data of sequence 2 is generated, and the open chain date of sequence 2 is 20200227.
2. With the continuous change of the account balance, the account balance is changed to 4000 yuan at 3.3.3.2020, the serial number 2 is subjected to the closed chain operation, the serial number 3 is subjected to the open chain operation, and the closed chain date in the serial number 2 and the open chain date in the serial number 3 are the same and are both 20200303.
3. Similarly, after the account balance becomes 4800 yuan, the serial number 3 performs closed-chain operation, the serial number 4 performs open-chain operation, and the closed-chain date in the serial number 3 is the same as the open-chain date in the serial number 4.
4. Serial number 4 is the latest status of the customer account balance, the account balance is 4800 yuan, its closed chain date is 21001231, representing the future time of 2100 years 12 months 31 days, we contract with this date, it means that the data is the current valid data, and serial numbers 1, 2, 3 are the data of its historical status.
The following takes the bank card customer account and the balance of the account as an example:
broken link data in the pull link list:
Figure BDA0002903694100000061
as can be seen from the above table, the closed link date of the serial number 3 and the open link date of the serial number 4 do not match, which is a chain broken phenomenon, and indicates that the state information of the service data is missing in the period of 3/18/2020 and 4/23/2020.
Example 1
In accordance with an embodiment of the present invention, there is provided an embodiment of a method of determining broken link data, it being noted that the steps illustrated in the flowchart of the drawings may be performed in a computer system such as a set of computer-executable instructions and that, although a logical order is illustrated in the flowchart, in some cases the steps illustrated or described may be performed in an order different than here.
Fig. 1 is a flowchart of a method for determining broken link data according to an embodiment of the present invention, as shown in fig. 1, the method includes the following steps:
step S102, obtaining a data linked list in a big data platform, wherein the data linked list comprises: table home key and closed link date;
step S104, taking the table main key and the closed chain date as a row main key and a key value of the distributed open source database respectively;
and step S106, when detecting that the big data platform receives the service data, comparing the service data with the stored data in the distributed open source database based on the row main key and the key value to determine whether the service data is broken link data.
In the embodiment of the present invention, by obtaining a data linked list in a big data platform, the data linked list includes: table home key and closed link date; respectively taking the table main key and the closed chain date as a row main key and a key value of the distributed open source database; when the big data platform is detected to receive the service data, the service data and the stored data in the distributed open source database are compared based on the row main key and the key value to determine whether the service data are broken link data, the purposes of saving computer memory resources and shortening the running time are achieved, the technical effect of improving the calculation efficiency of determining the broken link data is achieved, and the technical problems that in the prior art, the scheme for inquiring the broken link data is grouped according to the main key and sequenced according to the open link date, a large amount of computer memory resources are consumed, the running time is long, and the calculation efficiency is affected are solved.
The method is based on a big data platform, is suitable for different technical architectures such as storm, hadoop and spark, and is a technical scheme for optimizing a certain data quality check item (zipper table broken chain) in a processing flow of service source system data.
It should be noted that the embodiment of the present invention can be applied to financial institutions such as banks or financial service institutions, and the present invention can determine broken link data in a pull link table, save memory resources of a computer, improve query efficiency, and quickly locate service data with broken links.
The embodiment of the present application is based on a large data distributed cache, for example, Hbase (distributed column-oriented open source database, nosql database, rowkey serving as a main key in the process of adding, deleting, modifying and checking, row key), Redis (high-performance key-value database), and the like.
The Hbase is exemplified in the embodiment of the application, and based on the characteristic that quick query can be performed through a main row key rowkey of the Hbase, the table data with closed chains is stored in the Hbase by taking the main row key of the table as the main row key of the Hbase; when the big data platform inserts the service data, searching in Hbase by taking the main key information of the newly added data as a rowkey, if the main key information is not inquired, indicating that the data is not stored before, and normally storing the service data into the big data platform if no problem exists; if the data can be inquired, the situation that the chain is broken exists in the data is explained, the data needs to be analyzed to find out the reason of the chain breaking, and the chain breaking information is supplemented.
In an optional embodiment, fig. 2 is a flowchart of an optional method for determining broken link data according to an embodiment of the present invention, and as shown in fig. 2, the service data is sent to the big data platform at least in the following manner:
step S202, all the business data generated in the daily business process are obtained through the business source system, and the business data are sent to the big data platform through the service bus.
In the embodiment of the present application, for example, all business data generated in a daily business process of a certain financial institution may be acquired by a business source system, and the business data in the business source system is sent to the big data platform through a service bus.
In an alternative embodiment, as also shown in fig. 2, the method further comprises:
step S204, after the big data platform receives the service data, analyzing the service data to obtain an analysis result, and verifying whether the service data is completely transmitted or not based on the analysis result, wherein at least the field name, the field number and the field type of the service data are verified.
In the embodiment of the application, after the big data platform receives the service data, the service data is analyzed and the integrity of the data and the consistency of metadata (such as field names, field numbers, field types and the like) of two parties in the transmission process are verified; and after the verification is successful, the data of the service source system and the effective data of the big data platform are fully connected according to the main key, and data comparison is carried out.
In an optional embodiment, comparing the service data with the stored data in the distributed open source database based on the row main key and the key value to determine whether the service data is broken link data includes:
step S302, if the business data does not exist in the big data platform, comparing the business data with the stored data based on the row main key and the key value, and determining whether the stored data contains the business data;
step S304, if the stored data does not contain the service data, performing open-chain operation on the big data platform to newly add the service data, and recording the latest effective state of the service data;
step S306, if the stored data includes the service data, determining that the service data is the broken link data.
In the embodiment of the application, if the service data exists in the source system and the big data platform does not exist, firstly, judging whether the data is in Hbase, and inquiring the data by taking the primary key as a rowkey, if the inquiry does not show that the data is newly added data of the source system, the big data platform performs open-chain operation, newly adds a piece of data, and records the latest effective state of the data; if the query can indicate that the primary key data has previously become invalid data, the data may be broken if the large data platform is re-entered.
In an optional embodiment, comparing the service data with the stored data in the distributed open source database based on the row main key and the key value to determine whether the service data is broken link data includes:
step S402, if the business data exists in the big data platform, based on the row main key and the key value, comparing the business data with the stored data, and determining whether the stored data contains the business data;
step S404, if the stored data includes the service data, performing field analysis on the service data, if the field of the service data is different from the field of the existing service data, performing a closed-chain operation on the big data platform, performing a re-open-chain operation on the big data platform to add the service data newly, and recording the latest valid state of the service data.
In the embodiment of the present application, if the service data exists in the source system, the big data platform also has data, and for the case that such data needs to be compared in fields and the fields are different, it indicates that the source system has updated the data, and the big data platform needs to perform a closed-link operation to end the previous data state, and simultaneously perform an open-link operation to add a new piece of data and record the latest valid state of the data; and for the condition that the fields have no difference, the data of the source system is not changed, and the large data platform does not do any operation.
In an optional embodiment, the method further includes: and if the service data exists in the big data platform and the service data does not exist in the service source system, executing closed-chain operation on the big data platform and marking the service data as an invalid state.
In the embodiment of the present application, if the service data does not exist in the source system and the service data exists in the big data platform, it indicates that the service data is deleted by the source system, and the big data platform directly performs closed-chain processing; meanwhile, the main key is stored as rowkey and the date of the day is stored as a value in Hbase, indicating that the data has become invalid data.
In an optional embodiment, the method further includes:
step S502, compiling query sentences by adopting a data warehouse tool;
step S504, grouping a plurality of data linked lists by the table primary key based on the query statement, and sequencing the plurality of data linked lists according to the open-chain date of the data linked lists to obtain a grouping sequencing result;
step S506, the above-mentioned packet sorting result is stored in the temporary storage area.
In the application embodiment, the query statement can be written by adopting a data warehouse tool; grouping a plurality of data linked lists by the table main key based on the query statement, and sequencing the plurality of data linked lists according to the open-chain date of the data linked lists to obtain a grouping and sequencing result; and storing the grouping sequencing result into a temporary storage area.
In an optional embodiment, after storing the packet ordering result in the temporary storage area, the method further includes:
step S602, determining whether a target data linked list exists in the temporary storage area, wherein the target data linked list contains two adjacent data with the same table primary key and different closed link dates;
step S604, inquiring whether the target data linked list has broken link data.
In the above optional embodiment, after the packet sorting result is stored in the temporary storage area, it may be further determined whether a target data linked list exists in the temporary storage area, and whether broken link data exists in the target data linked list is queried.
As an alternative embodiment, the method provided in the embodiment of the present application is illustrated by the following specific process steps:
first, Hbase data is initialized, invalid data in a table is found, namely, the invalid data is grouped by a primary key in the table, and open-chain data, namely data with a closed-chain date of '21001231', does not exist in a data chain table. The primary key is used as rowkey, and the last closed chain date of the primary key is used as a value to be stored in Hbase.
Secondly, the service source system sends service data to a big data platform, after the big data platform receives the service data, the service data is analyzed to obtain an analysis result, whether the service data is completely transmitted is verified based on the analysis result, the service data and the stored data in the distributed open source database are compared based on the row main key and the key value to determine whether the service data is broken link data, for example, the data of the service source system and the effective data of the big data platform are fully connected according to the main key, and the comparison result obtained by data comparison can include but is not limited to the following 3 conditions:
case 1: judging whether the data exists in Hbase or not, inquiring by taking a main key as a rowkey, if the data is not inquired, indicating that the data is newly added data of the source system, performing open-chain operation on the big data platform, adding a new piece of data, and recording the latest effective state of the data; if the query can indicate that the primary key data has previously become invalid data, the data may be broken if the large data platform is re-entered.
Case 2: data exists in a source system and data also exists in a big data platform, and for the condition that the data needs to be compared in fields and the fields are different, the source system updates the data, the big data platform needs to perform closed-chain operation, the previous data state is finished, meanwhile, open-chain operation is performed, a piece of data is newly added, and the latest effective state of the data is recorded; and for the condition that the fields have no difference, the data of the source system is not changed, and the large data platform does not do any operation.
Case 3: data do not exist in the source system, a big data platform exists, the data are deleted by the source system, and the big data platform carries out closed-chain processing; meanwhile, the main key is stored as rowkey and the date of the day is stored as a value in Hbase, indicating that the data has become invalid data.
In the embodiment of the application, the processing mode of combining the big data distributed cache technology and the zipper table is adopted to realize the fast query of the broken chain data of the zipper table, the judgment of the broken chain data in the zipper table can be realized through the embodiment of the application, the memory resource of a computer is saved, the query efficiency is improved, and the broken chain data can be fast positioned.
Example 2
According to an embodiment of the present invention, an embodiment of an apparatus for implementing the method for determining broken link data is further provided, fig. 3 is a schematic structural diagram of an apparatus for determining broken link data according to an embodiment of the present invention, and as shown in fig. 3, the apparatus for determining broken link data includes: an acquisition module 300, a processing module 302, and a determination module 304, wherein:
an obtaining module 300, configured to obtain a data linked list in a big data platform, where the data linked list includes: table home key and closed link date; a processing module 302, configured to use the table primary key and the closed chain date as a row primary key and a key value of the distributed open source database, respectively; a determining module 304, configured to, when it is detected that the big data platform receives the service data, compare the service data with the stored data in the distributed open source database based on the row main key and the key value, so as to determine whether the service data is broken link data.
It should be noted that the above modules may be implemented by software or hardware, for example, for the latter, the following may be implemented: the modules can be located in the same processor; alternatively, the modules may be located in different processors in any combination.
It should be noted here that the above-mentioned obtaining module 300, processing module 302 and determining module 304 correspond to steps S102 to S106 in embodiment 1, and the above-mentioned modules are the same as the examples and application scenarios realized by the corresponding steps, but are not limited to what is disclosed in embodiment 1 above. It should be noted that the modules described above may be implemented in a computer terminal as part of an apparatus.
It should be noted that, reference may be made to the relevant description in embodiment 1 for alternative or preferred embodiments of this embodiment, and details are not described here again.
The apparatus for determining the link-breaking data may further include a processor and a memory, where the obtaining module 300, the processing module 302, the determining module 304, and the like are all stored in the memory as program units, and the processor executes the program units stored in the memory to implement corresponding functions.
The processor comprises a kernel, and the kernel calls a corresponding program unit from the memory, wherein one or more than one kernel can be arranged. The memory may include volatile memory in a computer readable medium, Random Access Memory (RAM) and/or nonvolatile memory such as Read Only Memory (ROM) or flash memory (flash RAM), and the memory includes at least one memory chip.
According to an embodiment of the present application, there is also provided an embodiment of a non-volatile storage medium. Optionally, in this embodiment, the nonvolatile storage medium includes a stored program, and the apparatus in which the nonvolatile storage medium is located is controlled to execute any one of the methods for determining the broken link data when the program runs.
Optionally, in this embodiment, the nonvolatile storage medium may be located in any one of a group of computer terminals in a computer network, or in any one of a group of mobile terminals, and the nonvolatile storage medium includes a stored program.
Optionally, the apparatus in which the non-volatile storage medium is controlled to perform the following functions when the program is executed: obtaining a data linked list in a big data platform, wherein the data linked list comprises: table home key and closed link date; respectively taking the table main key and the closed chain date as a row main key and a key value of the distributed open source database; when the big data platform is detected to receive service data, the service data and the stored data in the distributed open source database are compared based on the row main key and the key value to determine whether the service data are broken link data.
Optionally, the apparatus in which the non-volatile storage medium is controlled to perform the following functions when the program is executed: and acquiring all service data generated in the daily service process through a service source system, and sending the service data to the big data platform through a service bus.
Optionally, the apparatus in which the non-volatile storage medium is controlled to perform the following functions when the program is executed: after the big data platform receives the service data, the service data is analyzed to obtain an analysis result, and whether the service data is completely transmitted or not is verified based on the analysis result, wherein at least the field name, the field number and the field type of the service data are verified.
Optionally, the apparatus in which the non-volatile storage medium is controlled to perform the following functions when the program is executed: if the business data does not exist in the big data platform, comparing the business data with the stored data based on the row main key and the key value to determine whether the stored data contains the business data; if the stored data does not contain the business data, performing open-chain operation on the big data platform to newly add the business data, and recording the latest effective state of the business data; and if the stored data contains the service data, determining the service data as the broken link data.
Optionally, the apparatus in which the non-volatile storage medium is controlled to perform the following functions when the program is executed: if the business data exists in the big data platform, comparing the business data with the stored data based on the row main key and the key value to determine whether the stored data contains the business data; if the storage data contains the business data, field analysis is carried out on the business data, if the field of the business data is different from the field of the existing business data, closed-chain operation is carried out on the big data platform, re-open-chain operation is carried out on the big data platform to newly add the business data, and the latest effective state of the business data is recorded.
Optionally, the apparatus in which the non-volatile storage medium is controlled to perform the following functions when the program is executed: and if the service data exists in the big data platform and the service data does not exist in the service source system, executing closed-chain operation on the big data platform and marking the service data as an invalid state.
Optionally, the apparatus in which the non-volatile storage medium is controlled to perform the following functions when the program is executed: compiling query statements by adopting a data warehouse tool; grouping a plurality of data linked lists by the table main key based on the query statement, and sequencing the plurality of data linked lists according to the open-chain date of the data linked lists to obtain a grouping and sequencing result; and storing the grouping sequencing result into a temporary storage area.
Optionally, the apparatus in which the non-volatile storage medium is controlled to perform the following functions when the program is executed: determining whether a target data linked list exists in the temporary storage area, wherein the target data linked list contains two adjacent data with the same table primary key and different closed link dates; and inquiring whether the target data linked list has broken link data or not.
According to an embodiment of the present application, there is also provided an embodiment of a processor. Optionally, in this embodiment, the processor is configured to execute a program, where the program executes any one of the methods for determining broken link data.
According to an embodiment of the present application, there is further provided an embodiment of an electronic device, including a memory and a processor, where the memory stores a computer program, and the processor is configured to execute the computer program to perform any one of the above methods for determining broken link data.
There is further provided, according to an embodiment of the present application, an embodiment of a computer program product, which, when being executed on a data processing device, is adapted to carry out a program of initializing the method steps of determining broken link data of any of the above.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the above-described division of the units may be a logical division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit may be stored in a computer-readable nonvolatile storage medium if it is implemented in the form of a software functional unit and sold or used as a separate product. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a non-volatile storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to perform all or part of the steps of the above methods according to the embodiments of the present invention. And the aforementioned nonvolatile storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (10)

1. A method of determining broken link data, comprising:
acquiring a data linked list in a big data platform, wherein the data linked list comprises: table home key and closed link date;
taking the table main key and the closed chain date as a row main key and a key value of the distributed open source database respectively;
when the big data platform is detected to receive service data, the service data is compared with the stored data in the distributed open source database based on the row main key and the key value, so that whether the service data is broken link data or not is determined.
2. The method of claim 1, wherein the service data is sent to the big data platform at least by: all service data generated in the daily service process are acquired through a service source system, and the service data are sent to the big data platform through a service bus.
3. The method of claim 1, further comprising:
after the big data platform receives the service data, analyzing the service data to obtain an analysis result, and verifying whether the service data is completely transmitted or not based on the analysis result, wherein at least the field name, the field number and the field type of the service data are verified.
4. The method of claim 1, wherein comparing the business data with stored data in the distributed open source database based on the row primary key and the key value to determine whether the business data is broken link data comprises:
if the business data does not exist in the big data platform, comparing the business data with the stored data based on the row main key and the key value to determine whether the stored data contains the business data;
if the stored data does not contain the business data, performing open-chain operation on the big data platform to newly add the business data, and recording the latest effective state of the business data;
and if the stored data contains the service data, determining the service data as the broken link data.
5. The method of claim 1, wherein comparing the business data with stored data in the distributed open source database based on the row primary key and the key value to determine whether the business data is broken link data comprises:
if the business data exists in the big data platform, comparing the business data with the stored data based on the row main key and the key value, and determining whether the stored data contains the business data;
if the stored data contains the business data, performing field analysis on the business data, if the field of the business data is different from the field of the existing business data, performing closed-chain operation on the big data platform, performing open-chain operation on the big data platform again to add the business data, and recording the latest effective state of the business data.
6. The method of claim 1, further comprising:
and if the service data exists in the big data platform and the service data does not exist in a service source system, executing closed-chain operation on the big data platform and marking the service data as an invalid state.
7. An apparatus for determining broken link data, comprising:
an obtaining module, configured to obtain a data linked list in a big data platform, where the data linked list includes: table home key and closed link date;
the processing module is used for respectively taking the table main key and the closed chain date as a row main key and a key value of the distributed open source database;
and the determining module is used for comparing the service data with the stored data in the distributed open source database based on the row main key and the key value when the service data received by the big data platform is detected, so as to determine whether the service data is broken link data.
8. A non-volatile storage medium, characterized in that it stores a plurality of instructions adapted to be loaded by a processor and to perform the method of determining broken link data according to any one of claims 1 to 6.
9. A processor for running a program, wherein the program is arranged to perform the method of determining broken link data of any one of claims 1 to 6 when run.
10. An electronic device comprising a memory and a processor, wherein the memory has stored therein a computer program, and the processor is configured to execute the computer program to perform the method of determining broken link data according to any one of claims 1 to 6.
CN202110064688.XA 2021-01-18 2021-01-18 Method and device for determining broken link data and nonvolatile storage medium Pending CN112749167A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110064688.XA CN112749167A (en) 2021-01-18 2021-01-18 Method and device for determining broken link data and nonvolatile storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110064688.XA CN112749167A (en) 2021-01-18 2021-01-18 Method and device for determining broken link data and nonvolatile storage medium

Publications (1)

Publication Number Publication Date
CN112749167A true CN112749167A (en) 2021-05-04

Family

ID=75652372

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110064688.XA Pending CN112749167A (en) 2021-01-18 2021-01-18 Method and device for determining broken link data and nonvolatile storage medium

Country Status (1)

Country Link
CN (1) CN112749167A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110209891A (en) * 2019-06-19 2019-09-06 河南中原消费金融股份有限公司 A kind of zipper table generating method, device, equipment and medium
CN111143463A (en) * 2020-01-06 2020-05-12 中国工商银行股份有限公司 Method and device for constructing bank data warehouse based on topic model
CN111813651A (en) * 2020-05-28 2020-10-23 杭州览众数据科技有限公司 Data abnormity testing method related to whole table structure and automatic testing tool

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110209891A (en) * 2019-06-19 2019-09-06 河南中原消费金融股份有限公司 A kind of zipper table generating method, device, equipment and medium
CN111143463A (en) * 2020-01-06 2020-05-12 中国工商银行股份有限公司 Method and device for constructing bank data warehouse based on topic model
CN111813651A (en) * 2020-05-28 2020-10-23 杭州览众数据科技有限公司 Data abnormity testing method related to whole table structure and automatic testing tool

Similar Documents

Publication Publication Date Title
CN103748579B (en) Data are handled in MapReduce frame
CN111459985B (en) Identification information processing method and device
CN109034993A (en) Account checking method, equipment, system and computer readable storage medium
US20130191523A1 (en) Real-time analytics for large data sets
CN110399373A (en) A kind of block chain account book storage system, storage querying method and delet method
CN109656999B (en) Method, device, storage medium and apparatus for synchronizing large data volume data
CN104050276A (en) Cache processing method and system of distributed database
CN105447168A (en) Method for restoring and recombining fragmented files in MP4 format
CN113326165A (en) Data processing method and device based on block chain and computer readable storage medium
CN109271545A (en) A kind of characteristic key method and device, storage medium and computer equipment
CN106649530B (en) Cloud detail query management system and method
CN112598510B (en) Resource data processing method and device
US8229946B1 (en) Business rules application parallel processing system
CN116485512A (en) Bank data analysis method and system based on reinforcement learning
CN110502549A (en) User data processing method, device, computer equipment and storage medium
CN112749167A (en) Method and device for determining broken link data and nonvolatile storage medium
CN115481026A (en) Test case generation method and device, computer equipment and storage medium
CN116107801A (en) Transaction processing method and related product
CN115098738A (en) Service data extraction method and device, storage medium and electronic equipment
CN111813833B (en) Real-time two-degree communication relation data mining method
CN113868283A (en) Data testing method, device, equipment and computer storage medium
CN115687599B (en) Service data processing method and device, electronic equipment and storage medium
CN111611056A (en) Data processing method and device, computer equipment and storage medium
CN114385722A (en) Interface attribute consistency checking method and device, electronic equipment and storage medium
Montanari et al. Near duplicate document detection for large information flows

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination