CN117573775A - Service data processing method and device, electronic equipment and storage medium - Google Patents

Service data processing method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN117573775A
CN117573775A CN202311597532.3A CN202311597532A CN117573775A CN 117573775 A CN117573775 A CN 117573775A CN 202311597532 A CN202311597532 A CN 202311597532A CN 117573775 A CN117573775 A CN 117573775A
Authority
CN
China
Prior art keywords
service data
node
data set
data
target service
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311597532.3A
Other languages
Chinese (zh)
Inventor
魏梦晴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Agricultural Bank of China
Original Assignee
Agricultural Bank of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Agricultural Bank of China filed Critical Agricultural Bank of China
Priority to CN202311597532.3A priority Critical patent/CN117573775A/en
Publication of CN117573775A publication Critical patent/CN117573775A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2471Distributed queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management
    • G06F8/71Version control; Configuration management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Computer Security & Cryptography (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention discloses a service data processing method, a device, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring child node configuration parameters of the node; determining a target service data set matched with the node according to the child node configuration parameters; the target service data set is formed by a plurality of data table fragments; determining a service data screening range of the node according to the child node configuration parameters and the data unique identification information, and screening a service data set to be processed of the node from the target service data set according to the service data screening range; and processing the service data set to be processed. The technical scheme of the embodiment of the invention can shorten the service processing running time and improve the service parallel processing efficiency.

Description

Service data processing method and device, electronic equipment and storage medium
Technical Field
The embodiment of the invention relates to the technical field of data processing and databases, in particular to a business data processing method, a business data processing device, electronic equipment and a storage medium.
Background
Today towards comprehensive digitization, traditional host architectures suffer from natural disadvantages in terms of the digitization and intelligence properties of the system. The distributed architecture is easier to fuse with other associated systems by means of cloud computing, the Internet, big data and other technologies, so that the whole business flow is opened, an integrated data view and business decision engine are formed, and business operation capacity is comprehensively improved. Meanwhile, with the explosive development of the volume of service subscribers, large service systems such as credit card distributed core systems (hereinafter referred to as credit card systems), online payment systems, and school books management systems must be able to support a huge customer group and a huge transaction volume, and thus, there is a higher demand for the capacity and performance expansion of the systems. It is imperative to build a new generation of business core systems based on distributed architecture, flexible extension, and high performance, high concurrency, and high efficiency operation. For large business systems, bulk business is one of the main types of business. Because the batch service processes a large amount of data each time, has a large amount of Input/Output (I/O), the execution time is long, and different batch services have a certain sequence. Therefore, it is necessary to reduce the batch operation time, ensure the node operation sequence and reduce the system operation pressure for large business systems.
In order to improve the response speed and the query performance of the system, the large service system can adopt a distributed database table design of horizontal segmentation to divide all rows in one table into different tables, and each table is called a data table fragment. Data table shards are an integral part of a horizontally partitioned distributed database. The large service system can horizontally divide service data into 470 data table fragments according to province, each data table fragment only has one province data, a plurality of data table fragments form one province all data, and all data table fragments form complete data. At present, the load balancing of batch program sub-nodes for a large service system mainly comprises two modes: and the concurrency is carried out through a database slicing key, namely each child node runs the data of the specified slicing of the database, so that hundreds of child nodes process batch business at the same time. The credit card system contains 470 sharded keys in total, so that 470 child nodes execute concurrently when the batch program is run. And secondly, logically splitting out the data packet (namely, the child node acquires the basic unit of data from the data table slice). The child node firstly reads the total data to be processed in the data table fragments into the system memory, then the child node total number N is subjected to remainder through the last three bits of the data identification (such as account number and the like) of the service system, and the result range of the remainder is 0-N-1. The child node screens and processes the service data with the remainder of numpass-1 according to the self-numbering numpass. Therefore, the total data to be processed of the batch program is divided into N parts in an equalizing way, the problem of long tail of program operation is avoided, and each child node processes different data. And all the child nodes execute concurrently, so that the load balance of data distribution and batch processing capacity is realized.
The inventors have found that the following drawbacks exist in the prior art in the process of implementing the present invention: for the batch program concurrency logic of the first mode, due to the fact that the number of users in each province and city is unbalanced, certain difference exists in data distribution of each sharding key of the database, even the situation that the data quantity to be processed of one sharding is several times that of the other sharding possibly occurs, the running time difference of each sub-node of the batch program is large, the problem of long tail occurs, the total running time of the batch is prolonged, the performance is poor, and the capability of transverse expansion is poor. For the batch program concurrency logic of the second mode, the data volume to be processed can be divided into a plurality of parts by the method of taking the remainder from the last three bits of the data identification, and the parts are processed by different sub-nodes, so that the running time of each sub-node is ensured to be basically consistent, the problem that the running time difference of each sub-node of the second mode is larger is solved, the batch running efficiency is improved, and better transverse expansion capability is obtained. However, since each sub-node of the second mode needs to read the whole amount of data to be processed from the database shard to calculate the remainder, and then the data of the specific remainder is screened out from the memory, the sub-node reads a large amount of invalid data. The effective data hit rate of the database is low, unnecessary database disk I/O and network overhead are generated, network traffic is high, and long-time lock disputes and long transaction problems are generated. Therefore, when the batch program is running, the response speed of other online transactions is easily affected, and the system stability is poor.
Disclosure of Invention
The embodiment of the invention provides a service data processing method, a device, electronic equipment and a storage medium, which can shorten service processing operation time and improve service parallel processing efficiency.
According to an aspect of the present invention, there is provided a service data processing method, applied to a service child node, including:
acquiring child node configuration parameters of the node;
determining a target service data set matched with the node according to the child node configuration parameters; the target service data set is formed by a plurality of data table fragments;
determining a service data screening range of the node according to the child node configuration parameters and the data unique identification information, and screening a service data set to be processed of the node from the target service data set according to the service data screening range;
and processing the service data set to be processed.
According to another aspect of the present invention, there is provided a service data processing apparatus configured in a service sub-node, including:
the child node configuration parameter acquisition module is used for acquiring child node configuration parameters of the node;
the target service data set determining module is used for determining a target service data set matched with the node according to the child node configuration parameters; the target service data set is formed by a plurality of data table fragments;
The to-be-processed service data set screening module is used for determining a service data screening range of the node according to the child node configuration parameters and the data unique identification information, and screening the to-be-processed service data set of the node from the target service data set according to the service data screening range;
and the processing module of the service data set to be processed is used for processing the service data set to be processed.
According to another aspect of the present invention, there is provided an electronic apparatus including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the traffic data processing method according to any one of the embodiments of the present invention.
According to another aspect of the present invention, there is provided a computer readable storage medium storing computer instructions for causing a processor to execute a service data processing method according to any one of the embodiments of the present invention.
According to the embodiment of the invention, the service sub-node obtains the sub-node configuration parameters of the node, so that the target service data set which is matched with the node and is formed by a plurality of data table fragments is determined according to the sub-node configuration parameters, and further, the service data screening range of the node is determined according to the sub-node configuration parameters and the data unique identification information, so that the service data set to be processed of the node is directly screened from the target service data set according to the service data screening range, and the service data is not required to be completely loaded into a memory and then calculated and screened, thereby rapidly processing the service data set to be processed, solving the problem of poor service processing performance caused by memory invalid data screening operation in the conventional batch service processing method, shortening service processing operation time and improving service parallel processing efficiency.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the invention or to delineate the scope of the invention. Other features of the present invention will become apparent from the description that follows.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flowchart of a service data processing method according to a first embodiment of the present invention;
fig. 2 is a flowchart of a service data processing method according to a second embodiment of the present invention;
FIG. 3 is a schematic diagram of a business data processing flow in a credit card system according to an embodiment of the invention;
FIG. 4 is a schematic diagram of flow monitoring during concurrent running of a batch program in the prior art with a remainder taken three digits after the tail number of the data;
FIG. 5 is a schematic diagram of monitoring the SQL execution condition of a single child node when a batch program in the prior art is running in a mode of taking the remainder from three digits after the tail number of the data;
FIG. 6 is a schematic diagram of flow monitoring during concurrent operation according to an embodiment of the present invention;
FIG. 7 is a schematic diagram of monitoring effects of SQL execution of a single sub-node in concurrent operation according to an embodiment of the present invention;
fig. 8 is a schematic diagram of a service data processing device according to a third embodiment of the present invention;
fig. 9 is a schematic structural diagram of an electronic device according to a fourth embodiment of the present invention.
Detailed Description
In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Example 1
Fig. 1 is a flowchart of a service data processing method provided in an embodiment of the present invention, where the embodiment is applicable to a case of directly screening a to-be-processed service data set of a node from a target service data set according to a child node configuration parameter and data unique identification information, and the method may be performed by a service data processing apparatus, where the apparatus may be implemented by software and/or hardware, and may generally be integrated in an electronic device, and the electronic device may be a terminal device or a server device, so long as the electronic device can process service data as the service child node.
Accordingly, as shown in fig. 1, the method includes the following operations:
s110, acquiring child node configuration parameters of the node.
The child node configuration parameters may be configured for the service child node, and are related parameters for identifying the service child node. Alternatively, the child node configuration parameters may be determined according to the number of service child nodes.
In the embodiment of the invention, the plurality of service sub-nodes can be nodes in a large service system. In general, large business systems can handle both online and bulk business types. The online service may be a service with high timeliness, such as settlement and repayment processed by a credit card system or an online payment system in real time, or a college entrance examination score service which needs to be released in real time by a high examination subsystem, and the like, and the result needs to be returned to the user in real time after the processing is completed. The batch service may be a service with low timeliness requirement, for example, a service with low requirement on account time, such as credit card system or online payment system, or a post-job position investigation service of the student management system. In a large-scale service system, a service node may correspond to a batch program to implement a certain service function. Further, service sub-nodes are configured for each service node through different sub-node configuration parameters, and each service sub-node simultaneously runs the same batch of programs but processes different data so as to realize program concurrency.
When the service sub-node is configured with the sub-node configuration parameters, optionally, the sub-node configuration parameters can be determined according to the number of the service sub-nodes. The configuration parameters of the corresponding configuration of each service sub-node have uniqueness, so that each service sub-node is uniquely identified through the configuration parameters of the sub-node. For example, when the number of service sub-nodes is 250, the sub-node configuration parameters of each service sub-node may be determined according to a value of 1-250. That is, the first service child node has a child node configuration parameter of 1, the first service child node has a child node configuration parameter of 2, … …, and so on.
S120, determining a target service data set matched with the node according to the child node configuration parameters; the target service data set is formed by a plurality of data table fragments.
The target service data set may be a data set formed by a plurality of data table fragments. The number of target service data sets may be one, that is, the data table fragments are not subjected to segmentation processing, and all the data table fragments are used as a complete target service data set. Alternatively, the number of target service data sets may be plural. And the data table fragments can be segmented to obtain a plurality of target service data sets, wherein the difference value of the data volume included in each target service data set is smaller than a set threshold value. The set threshold may be set according to actual requirements, and the embodiment of the present invention does not limit specific numerical values of the set threshold. I.e. to ensure that the data volume of each target service data set is approximately the same.
Accordingly, when the target service data set is multiple, each service sub-node may process part of the data in the corresponding target service data set. For example, when the target service data set is 2, that is, the target service data set a and the target service data set B, assuming that there are 500 service sub-nodes in total, 250 of the service sub-nodes may process the target service data set a, and the other 250 service sub-nodes may process the target service data set B. Therefore, before each service sub-node processes service data, it needs to determine to which specific target service data set the service data to be processed by the node belongs.
S130, determining a service data screening range of the node according to the child node configuration parameters and the data unique identification information, and screening a to-be-processed service data set of the node from the target service data set according to the service data screening range.
The unique data identification information may be unique identification information of service data, and may be used to uniquely identify each piece of service data. For example, for a credit card system or an online payment system, the data unique identification information may be a customer number, an account number, a card number, or the like. For the student status management system, the data unique identification information may be a student status or the like. The service data screening range may be determined by the unique data identification information, and is suitable for the data screening range of the node.
In the embodiment of the invention, after the service sub-node obtains the sub-node configuration parameter of the node, the service data screening range of the node can be determined according to the sub-node configuration parameter and the data unique identification information. After determining the service data screening range, the service child node can directly screen the service data set to be processed of the node from the target service data set according to the service data screening range. That is, the service child node can directly screen and obtain the effective data through the service data screening range, so that the operation of loading all data in the target service data set into the node memory and then calculating and screening the effective data set of the node is omitted, the hit rate of database reading can be increased, the I/O and load of the unit time of the database can be reduced, the node memory consumption is reduced, and the problem of network flow flushing caused by the loading process of a large amount of invalid data is effectively avoided. Meanwhile, the garbage processing time of the JVM (Java Virtual Machine ) is reduced, so that the operation time consumption of the service child node is greatly reduced, and the parallel processing efficiency of the service data is improved.
For example, when the child node configuration parameter of the service child node is 1, the range of the adapted data unique identification information can be calculated as the service data screening range of the node according to a certain rule by taking the child node configuration parameter '1' as a reference. For example, according to the configuration parameter "1" of the sub-node, it is determined that the three bits of data after the unique identification information of the data in the target service data set are 000-002, then the service sub-node can directly screen the data with the three bits of data after the unique identification information of the data in the target service data set, and construct the service data set to be processed according to the screened data.
And S140, processing the service data set to be processed.
Accordingly, when the service sub-node determines that the screening of all the to-be-processed service data sets is completed, the to-be-processed service data sets can be processed. Each service sub-node may process the matched set of pending service data in parallel,
according to the embodiment of the invention, the service sub-node obtains the sub-node configuration parameters of the node, so that the target service data set which is matched with the node and is formed by a plurality of data table fragments is determined according to the sub-node configuration parameters, and further, the service data screening range of the node is determined according to the sub-node configuration parameters and the data unique identification information, so that the service data set to be processed of the node is directly screened from the target service data set according to the service data screening range, and the service data is not required to be completely loaded into a memory and then calculated and screened, thereby rapidly processing the service data set to be processed, solving the problem of poor service processing performance caused by memory invalid data screening operation in the conventional batch service processing method, shortening service processing operation time and improving service parallel processing efficiency.
Example two
Fig. 2 is a flowchart of a service data processing method according to a second embodiment of the present invention, where the present embodiment is implemented based on the foregoing embodiment, and in the present embodiment, various specific alternative implementations of determining a target service data set matched with a node and screening a service data set to be processed of the node are provided. Accordingly, as shown in fig. 2, the method of this embodiment may include:
s210, acquiring child node configuration parameters of the node.
S220, determining a target service data set matched with the node according to the child node configuration parameters.
The target service data set is formed by a plurality of data table fragments.
In an optional embodiment of the present invention, the determining, according to the child node configuration parameter, the target service data set matched with the node may include: obtaining a mapping relation between the child node configuration parameters and the target service data set; inquiring a mapping relation between the child node configuration parameters and a target service data set according to the child node configuration parameters of the node so as to determine the target service data set matched with the node according to an inquiry result; or, acquiring a node configuration label configured by the child node configuration parameters; and determining a target service data set matched with the node according to the node configuration label of the node.
The acquiring node configuration tag may be a tag configured for a child node configuration parameter of a service child node, and is used for identifying a binding relationship between the service child node and a corresponding target service data set.
Alternatively, the target service data set that the service child node matches may be determined in a variety of ways. If a mapping relation table between the sub-node configuration parameters and the target service data set is configured for each service sub-node in advance, the service sub-node can query the mapping relation table to obtain the mapping relation between the sub-node configuration parameters of the node and the target service data set. For example, the mapping relation table records that a mapping relation exists between the child node configuration parameter "1" and the target service data set a. And after the service sub-node determines that the sub-node configuration parameter of the node is 1, determining that the target service data set matched with the node is the target service data set A by inquiring the mapping relation table.
Correspondingly, if the node configuration label is configured for the sub-node configuration parameter directly, the service sub-node can acquire the node configuration label to determine the target service data set matched with the node according to the node configuration label. Illustratively, assume that the node configuration label includes "a" and "B," where "a" represents the target traffic data set a and "B" represents the target traffic data set B. When the node configuration label of the node configuration parameter configuration of the node obtained by the service sub-node is "a", the service sub-node can determine that the target service data set matched with the node is the target service data set a.
S230, determining a reference data identification range interval of the service data to be processed as the service data screening range according to the sub-node configuration parameters and the equal difference division interval of the data unique identification information.
And S240, matching the unique data identification information of each target service data in the target service data set with the reference data identification range interval so as to screen the service data set to be processed of the node from the target service data set according to a matching result.
Wherein the reference data identification range section may be a range section determined using partial data unique identification information.
In the embodiment of the invention, the reference data identification range interval of the service data to be processed can be calculated according to the equal difference division interval of the child node configuration parameter and the data unique identification information, so as to determine the data unique identification information range of the service data to be processed, further match the data unique identification information of each target service data in the target service data set with the calculated reference data identification range interval, and screen the data with the data unique identification information positioned in the reference data identification range interval from the target service data set as the service data set to be processed of the node.
In an optional embodiment of the present invention, the determining, according to the equal difference dividing interval between the child node configuration parameter and the data unique identification information, the reference data identification range interval of the service data to be processed as the service data screening range may include:
determining a first threshold value of the reference data identification range interval based on the following formula:
M1=A×(numTask-1)
determining a second threshold value for the reference data identification range interval based on the following formula:
M2=A×numTask-1
wherein M1 represents a first critical value of the reference data identification range section, M2 represents a second critical value of the reference data identification range section, nummask represents the child node configuration parameter, a represents an arithmetic division interval of the data unique identification information, and the number of service child nodes is determined according to the arithmetic division interval of the data unique identification information and the section range of the data unique identification information.
Wherein the value of M1 is smaller than the value of M2. The equal difference division interval may be a reference interval at which the data unique identification information is equal difference divided.
In the embodiment of the invention, the number of the service sub-nodes can be determined according to the equal difference dividing interval of the data unique identification information and the interval range of the data unique identification information. In a specific example, a credit card system is taken as an example, and data in one target service data set in the credit card system is divided into a total of 250 interval ranges of 000-003, 004-007, 008-011, and 996-999 by an equi-differential method with a client number, an account number or the last three bits 000-999 of the card number as interval ranges of data unique identification information. Correspondingly, 250 service sub-nodes can be configured for the target service data set to process the service data of the target service data set in parallel. Similarly, if the last three digits 000 to 999 of the client number, account number or card number are used as the interval range of the data unique identification information, the interval range is divided into 000 to 004, 005 to 009, 010 to 015 according to an equi-differential mode that the equi-differential division interval is 5, and the total of 995 to 999 is 200 interval ranges. Correspondingly, 200 service sub-nodes can be configured for the target service data set to process the service data of the target service data set in parallel.
Correspondingly, after the equal difference dividing interval of the data unique identification information is determined, the interval from A× (numpass-1) to A×numpass-1 can be used as the reference data identification range interval of the service data to be processed of the service child node according to the determined equal difference dividing interval of the data unique identification information. For example, when the value of a is 4 and the value of numpass is 2, the reference data identification range of the service sub-node "2" is 004-007 with the last three digits of the account number, which represents that the service sub-node "2" can process the service data with the last three digits of 004, 005, 006 and 007 in the target service data set.
When inquiring data grouping for the service sub-node, the embodiment of the invention determines the screened interval range for the key field for data distribution balance, and further filters and identifies the service data set to be processed of the node.
In an optional embodiment of the present invention, the matching the unique data identifier information of each target service data in the target service data set with the reference data identifier range interval to screen the service data set to be processed of the node from the target service data set according to the matching result may include: constructing a virtual derivative column according to the unique data identification information of the target service data; and searching the virtual derivative column according to the reference data identification range interval to search and screen the service data set to be processed of the node.
When the service sub-node is found to process the service data of the large service system by utilizing the service data processing method according to the observed database execution plan and the database characteristics when the production environment of the large service system actually operates, the data in the reference data identification range interval is filtered and cannot be indexed because the database function is adopted for acquiring the data in the reference data identification range interval, so that the service sub-node traverses all data on the data table fragments each time when the data packet is pulled from the data table fragments, and then filters out the data belonging to the current data packet and returns the data. If the number of service sub-nodes is 250, the data traversed by the database is discarded 249/250 again when the service sub-nodes inquire each time. This is not only a significant waste of database resources, but also can lead to long transaction problems and lock problems, affecting the stability of other businesses (online transactions or other batch tasks).
To solve the above problems, the embodiments of the present invention introduce a virtual derivative column to further optimize the above scheme. The virtual derivative column is generated by conventional column value calculations, with the virtual derivative column changing as the conventional field value changes, the conventional column may be referred to as the source column of the virtual derivative column, and the secondary index may be created on one or more virtual derivative columns or on a combination of the virtual derivative column and the conventional column.
Accordingly, a virtual derivative column may be generated for the data unique identification information based on the data unique identification information when the data is screened, for example, in the above example of the credit card system, a virtual derivative column C 'of three digits after the client tail number may be generated with the client number field C, and C' may be included in the secondary index. When the service child nodes inquire service data in batches, the in filtering condition of the virtual derivative column pair C' is used for replacing the range filtering condition of the client number, so that data packet retrieval is realized. Because the index is removed from the retrieval of the service data, the filtering efficiency on the data table fragments can be improved by N times (the number of service sub-nodes), the performance of batch tasks is greatly optimized, and the influence on the stability of other services is further reduced.
The advantages of introducing virtual derivative column indexing over conventional column indexing are: first the virtual derived columns will automatically change with the change in source column values, which is done by the database itself, so that other traffic codes do not have to pay attention to the updating of C', and this optimization has no intrusive impact on other traffic. And the virtual column does not occupy the disk, so that the disk space and the table memory space can be saved.
S250, processing the service data set to be processed.
In order to more clearly describe the technical solution provided by the embodiment of the present invention, the embodiment of the present invention specifically describes the above-mentioned business data processing method by taking a credit card system as an example. Fig. 3 is a schematic diagram of a business data processing flow in a credit card system according to an embodiment of the invention. In a specific example, as shown in fig. 3, the business data processing flow in the credit card system may include the following operations:
first, all data table fragments of the credit card system are divided into two sets A and B, wherein the set A contains data of partial data table fragments, the set B contains data of the rest data table fragments, and the data volume of the set A and the data volume of the set B are ensured to be approximately the same.
Then, for the data of the set a or the set B, the last three digits 000 to 999 of the customer number, account number, or card number to be processed are divided into 000 to 003, 004 to 007, 008 to 011, 996 to 999 in an equidifferent manner. Assuming that the configuration parameter of each sub-node (short for service sub-node) is numpass, and the value range of the configuration parameter of the sub-node is 1 to 250, the sub-node can process data of the last three digits of the client number, the account number or the card number in the interval of 4× (numpass-1) to 4×numpass-1, so that all the data to be processed can be divided into 250 shares approximately, and the 250 sub-nodes can be processed. Each child node may screen the data processing set of the node for a 4× (nummask-1) to 4×nummask-1 interval.
Finally, the batch program configures 500 sub-nodes to run concurrently, the first 250 sub-nodes process the data of the data table fragments contained in the set A, and the last 250 sub-nodes process the data of the data table fragments contained in the set B. And each child node screens out the data to be processed according to the last three digits of the client number, the account number or the card number tail number. When the child node screens the data to be processed, the scope query is carried out according to the interval where the three bits are located after the tail number of the data to be processed of the child node by adding the sphere query condition in the query statement, the effective data to be processed of the child node is directly screened out without screening in a memory, the hit rate of reading the database is increased, the I/O and the load of the unit time of the database are reduced, the operation step of filtering the data in the child node of the system is omitted, the memory consumption of the child node is reduced, the JVM garbage processing time is also reduced, and the operation time of the child node is greatly shortened.
FIG. 4 is a schematic diagram of flow monitoring during concurrent operation of a batch program in the prior art with a remainder taken three digits after the tail number of the data, where the abscissa identifies the monitoring time and the ordinate identifies the monitored flow value. As shown in FIG. 4, the running time of the batch program is 16:29-16:37, and as can be seen from FIG. 4, the phenomenon of flow flushing occurs when the batch program is run in a mode of taking the remainder from three digits after the tail number of data, and the bandwidth is about 18G.
FIG. 5 is a schematic diagram of monitoring the execution of SQL execution of a single child node when a batch program in the prior art is running in a mode of taking the remainder from three digits after the tail number of the data. As can be seen from FIG. 5, during concurrent running of the batch program with a remainder taken three digits after the tail number of data, the average number of returned rows per query SQL is 217820.4 pieces of data, and the database I/O is excessive.
Fig. 6 is a schematic diagram of flow monitoring during concurrent operation according to an embodiment of the present invention. As shown in FIG. 6, the running time of the batch program is 16:46-16:48, and compared with the prior art, the service data processing method provided by the embodiment of the invention can optimize the running time of the batch from 8 minutes to 2 minutes on the premise of processing the same data volume, the network bandwidth is about 2G during running, and the flow is obviously reduced.
Fig. 7 is a schematic diagram of monitoring effects of SQL execution of a single child node in concurrent runtime according to an embodiment of the present invention. As can be seen from fig. 7, during the batch program running by using the service data processing method provided by the embodiment of the present invention, the average number of returned lines of each query SQL is 911.4 pieces of data, and the total number of returned lines and the average number of returned lines of the query SQL are greatly reduced.
Therefore, the business data processing method provided by the embodiment of the invention can shorten the batch business operation time of the large business system, reduce the flow rate during program operation and improve the response speed of online real-time transaction of concurrent operation.
The embodiment of the invention provides an optimization method for data load balancing during the concurrent running of batch programs of a large service system, and the batch system has the capability of improving the system performance by transversely expanding and linearly by ensuring that each service sub-node processes approximately the same data volume. In addition, the embodiment of the invention increases the virtual derivative columns and adds the indexes, optimizes the database retrieval efficiency, improves the database query performance, reduces the load on the database level, improves the overall stability of the system during the running of batch programs, and solves the problem of reduced response speed of other transactions caused by high flow rate; meanwhile, as the performance of the database is improved, the screening and filtering of invalid data in the internal memory of the batch program are reduced, the running time of the batch program is greatly shortened, and the performance and the stability of the system are improved, so that the system has higher concurrency.
It should be noted that any permutation and combination of the technical features in the above embodiments also belong to the protection scope of the present invention.
Example III
Fig. 8 is a schematic diagram of a service data processing apparatus according to a third embodiment of the present invention, where the apparatus is configured in a service sub-node, as shown in fig. 8, and the apparatus includes: a child node configuration parameter obtaining module 310, a target service data set determining module 320, a to-be-processed service data set screening module 330, and a to-be-processed service data set processing module 340, wherein:
a child node configuration parameter obtaining module 310, configured to obtain child node configuration parameters of the node;
a target service data set determining module 320, configured to determine a target service data set matched with the node according to the child node configuration parameter; the target service data set is formed by a plurality of data table fragments;
a to-be-processed service data set screening module 330, configured to determine a service data screening range of the node according to the child node configuration parameter and the data unique identifier information, and screen a to-be-processed service data set of the node from the target service data set according to the service data screening range;
and the to-be-processed service data set processing module 340 is configured to process the to-be-processed service data set.
According to the embodiment of the invention, the service sub-node obtains the sub-node configuration parameters of the node, so that the target service data set which is matched with the node and is formed by a plurality of data table fragments is determined according to the sub-node configuration parameters, and further, the service data screening range of the node is determined according to the sub-node configuration parameters and the data unique identification information, so that the service data set to be processed of the node is directly screened from the target service data set according to the service data screening range, and the service data is not required to be completely loaded into a memory and then calculated and screened, thereby rapidly processing the service data set to be processed, solving the problem of poor service processing performance caused by memory invalid data screening operation in the conventional batch service processing method, shortening service processing operation time and improving service parallel processing efficiency.
Optionally, the target service data set determining module 320 is specifically configured to: obtaining a mapping relation between the child node configuration parameters and the target service data set; inquiring a mapping relation between the child node configuration parameters and a target service data set according to the child node configuration parameters of the node so as to determine the target service data set matched with the node according to an inquiry result; or, acquiring a node configuration label configured by the child node configuration parameters; and determining a target service data set matched with the node according to the node configuration label of the node.
Optionally, the pending service data set screening module 330 is specifically configured to: determining a reference data identification range interval of the service data to be processed as the service data screening range according to the child node configuration parameters and the arithmetic division interval of the data unique identification information; and matching the data unique identification information of each target service data in the target service data set with the reference data identification range interval so as to screen the service data set to be processed of the node from the target service data set according to a matching result.
Optionally, the pending service data set screening module 330 is specifically configured to: determining a first threshold value of the reference data identification range interval based on the following formula:
M1=A×(numTask-1)
determining a second threshold value for the reference data identification range interval based on the following formula:
M2=A×numTask-1
wherein M1 represents a first critical value of the reference data identification range section, M2 represents a second critical value of the reference data identification range section, nummask represents the child node configuration parameter, a represents an arithmetic division interval of the data unique identification information, and the number of service child nodes is determined according to the arithmetic division interval of the data unique identification information and the section range of the data unique identification information.
Optionally, the pending service data set screening module 330 is specifically configured to: constructing a virtual derivative column according to the unique data identification information of the target service data; and searching the virtual derivative column according to the reference data identification range interval to search and screen the service data set to be processed of the node.
Optionally, the number of the target service data sets is multiple, and a difference value of data amounts included in each target service data set is smaller than a set threshold value.
Optionally, the service sub-node is a sub-node in a credit card distributed core system.
The service data processing device can execute the service data processing method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method. Technical details which are not described in detail in this embodiment can be referred to the service data processing method provided in any embodiment of the present invention.
Since the service data processing apparatus described above is an apparatus capable of executing the service data processing method in the embodiment of the present invention, those skilled in the art will be able to understand the specific implementation of the service data processing apparatus in the embodiment of the present invention and various modifications thereof based on the service data processing method described in the embodiment of the present invention, so how the service data processing apparatus implements the service data processing method in the embodiment of the present invention will not be described in detail herein. The device adopted by the technical personnel in the art to implement the service data processing method in the embodiment of the invention is within the scope of protection required by the application.
Example IV
Fig. 9 shows a schematic diagram of an electronic device 10 that may be used to implement an embodiment of the invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic equipment may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.
As shown in fig. 9, the electronic device 10 includes at least one processor 11, and a memory, such as a Read Only Memory (ROM) 12, a Random Access Memory (RAM) 13, etc., communicatively connected to the at least one processor 11, in which the memory stores a computer program executable by the at least one processor, and the processor 11 may perform various appropriate actions and processes according to the computer program stored in the Read Only Memory (ROM) 12 or the computer program loaded from the storage unit 18 into the Random Access Memory (RAM) 13. In the RAM 13, various programs and data required for the operation of the electronic device 10 may also be stored. The processor 11, the ROM 12 and the RAM 13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to bus 14.
Various components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, etc.; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, an optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.
The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, digital Signal Processors (DSPs), and any suitable processor, controller, microcontroller, etc. The processor 11 performs the various methods and processes described above, such as the traffic data processing method.
Optionally, the service data processing method includes: acquiring child node configuration parameters of the node; determining a target service data set matched with the node according to the child node configuration parameters; the target service data set is formed by a plurality of data table fragments; determining a service data screening range of the node according to the child node configuration parameters and the data unique identification information, and screening a service data set to be processed of the node from the target service data set according to the service data screening range; and processing the service data set to be processed.
In some embodiments, the business data processing method may be implemented as a computer program tangibly embodied on a computer-readable storage medium, such as the storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM 12 and/or the communication unit 19. One or more steps of the business data processing method described above may be performed when the computer program is loaded into the RAM 13 and executed by the processor 11. Alternatively, in other embodiments, the processor 11 may be configured to perform the traffic data processing method in any other suitable way (e.g. by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
A computer program for carrying out methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be implemented. The computer program may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. The computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) through which a user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.
The computing system may include clients and servers. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service are overcome.

Claims (10)

1. A service data processing method, which is applied to a service sub-node, comprising:
acquiring child node configuration parameters of the node;
determining a target service data set matched with the node according to the child node configuration parameters; the target service data set is formed by a plurality of data table fragments;
determining a service data screening range of the node according to the child node configuration parameters and the data unique identification information, and screening a service data set to be processed of the node from the target service data set according to the service data screening range;
and processing the service data set to be processed.
2. The method according to claim 1, wherein said determining the target service data set matched by the node according to the child node configuration parameters comprises:
obtaining a mapping relation between the child node configuration parameters and the target service data set;
inquiring a mapping relation between the child node configuration parameters and a target service data set according to the child node configuration parameters of the node so as to determine the target service data set matched with the node according to an inquiry result; or (b)
Acquiring a node configuration label configured by the child node configuration parameters;
and determining a target service data set matched with the node according to the node configuration label of the node.
3. The method according to claim 1, wherein the determining the service data screening range of the node according to the child node configuration parameter and the data unique identification information, and the screening the to-be-processed service data set of the node from the target service data set according to the service data screening range includes:
determining a reference data identification range interval of the service data to be processed as the service data screening range according to the child node configuration parameters and the arithmetic division interval of the data unique identification information;
And matching the data unique identification information of each target service data in the target service data set with the reference data identification range interval so as to screen the service data set to be processed of the node from the target service data set according to a matching result.
4. A method according to claim 3, wherein said determining a reference data identification range interval of the service data to be processed as the service data screening range according to the arithmetic division interval of the child node configuration parameter and the data unique identification information comprises:
determining a first threshold value of the reference data identification range interval based on the following formula:
M1=A×(numTask-1)
determining a second threshold value for the reference data identification range interval based on the following formula:
M2=A×numTask-1
wherein M1 represents a first critical value of the reference data identification range section, M2 represents a second critical value of the reference data identification range section, nummask represents the child node configuration parameter, a represents an arithmetic division interval of the data unique identification information, and the number of service child nodes is determined according to the arithmetic division interval of the data unique identification information and the section range of the data unique identification information.
5. The method according to claim 3, wherein the matching the unique data identification information of each target service data in the target service data set with the reference data identification range interval to screen the service data set to be processed of the node from the target service data set according to the matching result includes:
constructing a virtual derivative column according to the unique data identification information of the target service data;
and searching the virtual derivative column according to the reference data identification range interval to search and screen the service data set to be processed of the node.
6. The method of claim 1, wherein the number of target traffic data sets is a plurality, each of the target traffic data sets comprising a difference in data amount less than a set threshold.
7. The method according to any of claims 1-6, wherein the service sub-node is a sub-node in a credit card distributed core system.
8. A service data processing apparatus, configured to a service sub-node, comprising:
the child node configuration parameter acquisition module is used for acquiring child node configuration parameters of the node;
The target service data set determining module is used for determining a target service data set matched with the node according to the child node configuration parameters; the target service data set is formed by a plurality of data table fragments;
the to-be-processed service data set screening module is used for determining a service data screening range of the node according to the child node configuration parameters and the data unique identification information, and screening the to-be-processed service data set of the node from the target service data set according to the service data screening range;
and the processing module of the service data set to be processed is used for processing the service data set to be processed.
9. An electronic device, the electronic device comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the traffic data processing method according to any one of claims 1 to 7.
10. A computer readable storage medium storing computer instructions for causing a processor to perform the business data processing of any one of claims 1-7.
CN202311597532.3A 2023-11-27 2023-11-27 Service data processing method and device, electronic equipment and storage medium Pending CN117573775A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311597532.3A CN117573775A (en) 2023-11-27 2023-11-27 Service data processing method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311597532.3A CN117573775A (en) 2023-11-27 2023-11-27 Service data processing method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117573775A true CN117573775A (en) 2024-02-20

Family

ID=89893458

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311597532.3A Pending CN117573775A (en) 2023-11-27 2023-11-27 Service data processing method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117573775A (en)

Similar Documents

Publication Publication Date Title
CN113342564B (en) Log auditing method and device, electronic equipment and medium
CN103620601A (en) Joining tables in a mapreduce procedure
CN110727663A (en) Data cleaning method, device, equipment and medium
CN116225769B (en) Method, device, equipment and medium for determining root cause of system fault
CN112528067A (en) Graph database storage method, graph database reading method, graph database storage device, graph database reading device and graph database reading equipment
CN116611411A (en) Business system report generation method, device, equipment and storage medium
CN115794744A (en) Log display method, device, equipment and storage medium
CN117573775A (en) Service data processing method and device, electronic equipment and storage medium
CN115328917A (en) Query method, device, equipment and storage medium
CN114896418A (en) Knowledge graph construction method and device, electronic equipment and storage medium
CN114968950A (en) Task processing method and device, electronic equipment and medium
CN112907009B (en) Standardized model construction method and device, storage medium and equipment
CN117742900B (en) Method, device, equipment and storage medium for constructing service call graph
CN115525659A (en) Data query method and device, electronic equipment and storage medium
CN117609311A (en) Service degradation method, device, equipment and storage medium
CN118152587A (en) Business behavior map construction method and device, electronic equipment and storage medium
CN115757928A (en) Data query method and device, electronic equipment and storage medium
CN117520601A (en) Graph database query method and device, storage medium, equipment and product
CN117851390A (en) Blank certificate processing method, device, equipment and storage medium
CN115577055A (en) Data processing method, device and equipment based on HBase data table and storage medium
CN115203246A (en) Linked list query method and device, electronic equipment and storage medium
CN115834179A (en) Policy aggregation method and device and electronic equipment
CN116342280A (en) Data determination method and device, electronic equipment and storage medium
CN114416881A (en) Real-time synchronization method, device, equipment and medium for multi-source data
CN117112601A (en) Database data compression method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination