CN114490537A

CN114490537A - Multi-server data processing method and device, computer equipment and storage medium

Info

Publication number: CN114490537A
Application number: CN202210137996.5A
Authority: CN
Inventors: 孙佳正; 顾佳骏; 吴炜; 黄荣清
Original assignee: Industrial and Commercial Bank of China Ltd ICBC
Current assignee: Industrial and Commercial Bank of China Ltd ICBC
Priority date: 2022-02-15
Filing date: 2022-02-15
Publication date: 2022-05-13

Abstract

The application relates to a multi-server data processing method, a multi-server data processing device, a computer device, a storage medium and a computer program product. The method comprises the following steps: the method comprises the steps that a first server obtains a corresponding relation between a server and data fragments, and the corresponding relation between the server and the data fragments is determined based on data files and the number of the fragments; the first server acquires the data file from the second server, and determines the target data fragment corresponding to the first server from the data file according to the corresponding relation between the server and the data fragment; and the first server processes the target data fragment. By adopting the method, a plurality of first servers can be called, each first server respectively obtains the data file from the second server, then the target data fragment responsible for each first server is determined according to the corresponding relation between the server and the data fragment, each first server traverses the data file, only the data in the range of the corresponding target data fragment is processed, and the efficiency and the fault tolerance rate of data processing can be improved.

Description

Multi-server data processing method and device, computer equipment and storage medium

Technical Field

The present application relates to the field of data processing technologies, and in particular, to a method and an apparatus for processing multi-server data, a computer device, a storage medium, and a computer program product.

Background

With the development of economy and the large-scale application of the mobile internet technology, the transaction frequency of customers is higher and higher, and the transaction scale is larger and larger. Due to the fact that the computing power of a single server is limited, the pressure of the server is increased, the more time is consumed by a bank for generating a bill, the later the bill generation is completed, and the single-node architecture cannot meet the requirement of a client on the time for receiving the bill. The traditional method for generating bank bills in the industry is to receive a transaction detail file on a single server, wherein each line in the file is the transaction detail of one client, and the transaction details of the same client are arranged together in sequence. The program reads the file line by line and generates a bill. Some types of bills generate a bill for each line of transaction details, such as an electronic receipt. Some types of bills are generated by summarizing multiple lines of transaction details of the same customer to form a bill, such as a merchant bill, displaying multiple lines of transaction details of the same merchant in a bill, and summarizing and displaying information such as transaction total of the merchant.

The current way of handling transaction detail files has the following problems: traditional batch operation can only be operated on one server, the operation cannot be executed after the server is down, and if the operation needs to be rerun due to data problems, all data can be rerun. There is not enough fault-tolerant space and processing efficiency for processing the transaction detail file.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a multi-server data processing method, apparatus, computer device, computer readable storage medium, and computer program product capable of improving data processing efficiency.

In a first aspect, the present application provides a method for multi-server data processing. The method comprises the following steps:

the method comprises the steps that a first server obtains a corresponding relation between a server and data fragments, and the corresponding relation between the server and the data fragments is determined based on data files and the number of the fragments;

the first server acquires the data file from the second server, and determines the target data fragment corresponding to the first server from the data file according to the corresponding relation between the server and the data fragment;

and the first server processes the target data fragment.

In one embodiment, the determining of the correspondence between the server and the data shards based on the data files and the shard number includes:

if the data type of the data file is a context-irrelevant type, the second server divides the data file into a plurality of data fragments, determines the corresponding relation between each first server and each data fragment according to the load information of each first server to obtain the corresponding relation between the server and the data fragment, or determines the corresponding relation between each first server and each data fragment according to the load information of each first server, and determines the corresponding relation between each second server and each data fragment according to the load information of each second server to obtain the corresponding relation between the server and the data fragment.

In one embodiment, dividing the data file into a number of data fragments comprises:

the second server acquires the data file and calculates the total data line number of the data file;

the second server calculates the number of fragments according to the number of the first servers and the attribute of each first server, and calculates the number of starting lines and the number of ending lines of each data fragment according to the number of fragments and the number of total data lines;

and the second server obtains a plurality of data fragments of the data file according to the starting line number and the ending line number of each data fragment.

if the data type of the data file is a context-dependent type, the second server obtains a plurality of fragment serial numbers according to the number of the fragments, determines the corresponding relation between each first server and each fragment serial number according to the load information of each first server to obtain the corresponding relation between the server and the data fragments, or determines the corresponding relation between each first server and each fragment serial number according to the load information of each first server, and determines the corresponding relation between each second server and each fragment serial number according to the load information of each second server to obtain the corresponding relation between the server and the data fragments.

In one embodiment, the method for acquiring a data file from a second server by a first server and determining a target data fragment corresponding to the first server from the data file according to a correspondence between the server and the data fragment includes:

the first server acquires a primary key value corresponding to each piece of data to be processed in the data file, and determines a fragment sequence number corresponding to each piece of data to be processed according to the primary key value corresponding to each piece of data to be processed and the number of fragments;

the first server obtains a fragment serial number corresponding to the first server according to the corresponding relation between the server and the data fragments, determines each piece of data to be processed corresponding to the first server according to the fragment serial number corresponding to the first server and the fragment serial number corresponding to each piece of data to be processed, and obtains a target data fragment corresponding to the first server according to each piece of data to be processed corresponding to the first server.

In one embodiment, the processing of the target data fragment by the first server includes:

the first server processes each piece of data to be processed in the target data fragment to generate corresponding first information;

determining a primary key value corresponding to each piece of first information according to the primary key value corresponding to each piece of data to be processed;

and integrating the first information with the same primary key value to generate second information.

In a second aspect, the application also provides a multi-server data processing device. The device comprises:

the relation acquisition module is used for acquiring the corresponding relation between the server and the data fragments by the first server, and the corresponding relation between the server and the data fragments is determined based on the data files and the number of the fragments;

the data confirmation module is used for acquiring the data file from the second server by the first server and determining the target data fragment corresponding to the first server from the data file according to the corresponding relation between the server and the data fragment;

and the data processing module is used for processing the target data fragment by the first server.

In a third aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor implementing the following steps when executing the computer program:

and the first server processes the target data fragment.

In a fourth aspect, the present application further provides a computer-readable storage medium. The computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of:

and the first server processes the target data fragment.

In a fifth aspect, the present application further provides a computer program product. The computer program product comprising a computer program which when executed by a processor performs the steps of:

and the first server processes the target data fragment.

According to the multi-server data processing method, the multi-server data processing device, the computer equipment, the storage medium and the computer program product, the first server obtains the corresponding relation between the server and the data fragments, and the corresponding relation between the server and the data fragments is determined based on the data files and the number of the fragments; the first server acquires the data file from the second server, and determines the target data fragment corresponding to the first server from the data file according to the corresponding relation between the server and the data fragment; and the first server processes the target data fragment. The method comprises the steps that a data file to be processed is obtained in advance through a second server, the data file is divided into data fragments with a certain fragment number, corresponding relations between the servers and the data fragments are established in advance aiming at the data file, when the data file needs to be processed, a plurality of first servers are called, each first server obtains the data file from the second server, then target data fragments which are responsible for each first server are determined according to the corresponding relations between the servers and the data fragments, each first server traverses the data file, only data in the range of the corresponding target data fragments are processed, and the efficiency and the fault tolerance rate of data processing can be improved.

Drawings

FIG. 1 is a flow diagram that illustrates a method for multi-server data processing in one embodiment;

FIG. 2 is a flow diagram illustrating selection of a primary server in one embodiment;

FIG. 3 is a flow diagram that illustrates the division of a data file into data fragments, according to one embodiment;

FIG. 4 is a flow diagram illustrating a process for determining a target data segment corresponding to a first server in one embodiment;

FIG. 5 is a block diagram of a multi-server data processing apparatus in one embodiment;

FIG. 6 is a diagram illustrating an internal structure of a computer device according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

In an embodiment, as shown in fig. 1, a multi-server data processing method is provided, and this embodiment is illustrated by applying the method to a terminal, it is to be understood that the method may also be applied to a server, and may also be applied to a system including a terminal and a server, and is implemented by interaction between the terminal and the server. In this embodiment, the method includes the steps of:

102, the first server obtains a corresponding relation between the server and the data fragments, and the corresponding relation between the server and the data fragments is determined based on the data files and the number of the fragments.

The data fragmentation refers to dividing a large data set into a plurality of small data sets, each small data set is a data fragmentation, data in each data fragmentation range are not overlapped, and the sum of all the data fragmentations is equal to the complete large data set; the correspondence between the servers and the data shards is used to characterize which data shards are processed by each first server.

Specifically, for a data file to be processed, the number of fragments is determined in advance according to the number of available first servers and the attribute state of each first server, then the data file is divided into data fragments with the number of fragments, and a corresponding relation between the server and the data fragments is established. When the data file needs to be processed, each first server acquires a pre-established corresponding relation between the server and the data fragments.

In one possible implementation manner, the second server determines the number of fragments according to the number of available first servers and the attribute state of each first server in advance, then divides the data file into data fragments of the number of fragments, establishes a corresponding relationship between the server and the data fragments, and stores the corresponding relationship between the server and the data fragments in a remote database, wherein the remote database is accessible by all servers. When the data file needs to be processed, each first server accesses the remote database to acquire the pre-established corresponding relation between the server and the data fragments.

And 104, the first server acquires the data file from the second server, and determines the target data fragment corresponding to the first server from the data file according to the corresponding relation between the server and the data fragment.

The second server may be a master server selected from the plurality of first servers, or may be a master server selected in addition to all the first servers.

Specifically, the second server stores the data file to be processed in advance, determines the number of fragments according to the number of available first servers and the attribute state of each first server, and then divides the data file into the data fragments of the number of fragments. When the data file needs to be processed, each first server acquires the data file from the second server, then the target data fragment corresponding to each first server is determined from the data file according to the corresponding relation between the servers and the data fragments, and one first server can correspond to a plurality of data fragments.

And 106, processing the target data fragment by the first server.

Specifically, each first server traverses the entire data file, but only processes data within the target data fragmentation range corresponding to the first server, and each server generates at least one data processing result.

In the multi-server data processing method, a first server acquires a corresponding relation between a server and data fragments, and the corresponding relation between the server and the data fragments is determined based on data files and the number of the fragments; the first server acquires the data file from the second server, and determines the target data fragment corresponding to the first server from the data file according to the corresponding relation between the server and the data fragment; and the first server processes the target data fragment. The method comprises the steps that a data file to be processed is obtained in advance through a second server, the data file is divided into data fragments with a certain fragment number, corresponding relations between the servers and the data fragments are established in advance aiming at the data file, when the data file needs to be processed, a plurality of first servers are called, each first server obtains the data file from the second server, then target data fragments which are responsible for each first server are determined according to the corresponding relations between the servers and the data fragments, each first server traverses the data file, only data in the range of the corresponding target data fragments are processed, and the efficiency and the fault tolerance rate of data processing can be improved.

In one embodiment, as shown in fig. 2, a server is selected from a plurality of first servers as a primary server, the primary server is a second server, and the method for selecting the primary server includes:

step 202, obtaining the thread pool capacity configured in advance by each server.

Specifically, a fixed thread pool capacity is set for each server in advance, and the thread pool capacity refers to the maximum number of threads that can be carried by the corresponding server. The thread pool capacity may be the same or different for each server.

Step 204, checking the remaining capacity of the thread pool queue of each server.

Specifically, for each server, determining the remaining capacity of the server thread pool queue according to the thread pool capacity of the server and the number of tasks in the corresponding server thread pool queue; that is, for each server, the thread number that can be added to the corresponding server thread pool queue is obtained according to the maximum thread number of the server and the thread number in the corresponding server thread pool queue, so as to determine the remaining capacity of the server thread pool queue. And determining the idle degree of each server according to the residual capacity of the server thread pool queue.

In step 206, a most idle server is selected as the primary server, that is, a server with the largest remaining capacity of the thread pool queue is selected as the primary server.

In this embodiment, the thread pool capacity pre-configured for each server is obtained; checking the residual capacity of each server thread pool queue; a most idle server is selected as the primary server. The data processing efficiency of the main server can be ensured.

In one embodiment, the correspondence between the server and the data shards is determined based on the data files and the shard number, and includes:

The context irrelevant type means that each piece of data in the data file has no relation, and each piece of data is independent information.

Specifically, the types of the data files are different, and the information contained in the corresponding relationship between the server and the data fragments is different. If the data type of the data file is a context-irrelevant type, the second server calculates the number of fragments according to the number of available first servers and the attribute of each first server, and then divides the data file into the number of data fragments with the number of fragments. And finally, acquiring load information of each first server, and distributing at least one data fragment for each first server by adopting a server load balancing algorithm, so as to determine the corresponding relation between each first server and each data fragment and obtain the corresponding relation between the servers and the data fragments.

In one possible embodiment, if the data type of the data file is a context-independent type, the second server first calculates the number of fragments according to the number of available first servers, the number of second servers, the attribute of each first server, and the attribute of each second server, and then divides the data file into the number of fragments. And finally, acquiring load information of each first server and load information of each second server, and distributing at least one data fragment for each first server and each second server by adopting a server load balancing algorithm, so as to determine the corresponding relation between each first server and each data fragment and the corresponding relation between each second server and each data fragment, and obtain the corresponding relation between the servers and the data fragments.

In one embodiment, if the data type of the data file is a context-independent type, the server load balancing algorithm includes: a fixed thread pool is set for each server in advance, and one data fragment processing task is one task in a thread pool queue. The number of fragments to be processed by each server is allocated, and the remaining capacity of the thread pool queue of the server needs to be checked. And meanwhile, an unavailable server is detected, and if any server is down and unavailable, the data fragment is distributed to the available server for processing. The less the remaining capacity of the thread pool queue, the less the number of slices allocated. The more the thread pool queue remaining capacity, the more the number of data pieces allocated. For example, m data fragments are shared, n servers are shared, and the remaining capacity of the ith server thread pool queue is n_iIf the sum of the remaining tasks of the thread pool queues of all the servers is s, the number of the tasks of each server fragment is m (n)_i/s)。

In one embodiment, as shown in fig. 3, dividing the data file into a number of data fragments includes:

step 302, the second server obtains the data file and calculates the total data line number of the data file.

Specifically, in general, each line in a data file records one piece of data.

And step 304, the second server calculates the number of the fragments according to the number of the first servers and the attribute of each first server, and calculates the number of the starting lines and the number of the ending lines of each data fragment according to the number of the fragments and the total number of the data lines.

The attribute of each first server refers to the core number of each first server CPU, and in general, the core number of a server CPU determines the number of data fragments that can be processed by the server.

Specifically, the second server calculates the number of fragments according to the number of the first servers and the core number of the CPU of each first server, for example, there are k first servers, and if the CPU of each first server is 4 cores, the number of fragments is set to 4 × k, and ideally, each server concurrently processes 4 fragments, and each core of the CPU can process one fragment.

Further, the second server calculates a starting line number and an ending line number of each data fragment according to the number of fragments and the total data line number, for example, the total data line number is h, the number of fragments is n, the starting line number of the ith fragment (i is counted from 0, i is greater than or equal to 0 and less than or equal to n-1) is (h/n) i +1, when i is less than n-1, the ending line number of the ith fragment is (h/n) i +1, and when i is equal to n-1, the ending line number of the ith fragment is h.

Step 306, the second server obtains a plurality of data fragments of the data file according to the starting line number and the ending line number of each data fragment.

Specifically, the second server divides the data file into a plurality of data fragments according to the starting line number and the ending line number of each data fragment.

In the embodiment, the data file is obtained through the second server, and the total data line number of the data file is calculated; the second server calculates the number of fragments according to the number of the first servers and the attribute of each first server, and calculates the number of starting lines and the number of ending lines of each data fragment according to the number of fragments and the number of total data lines; and the second server obtains a plurality of data fragments of the data file according to the starting line number and the ending line number of each data fragment. The data processing method has the advantages that the whole data file can be divided into the plurality of data fragments, each data fragment is processed by the plurality of servers respectively, data processing efficiency is improved, when data processing is wrong, only the wrong data fragment needs to be determined, the corresponding server is determined, the data of the data fragment needs to be processed again, and fault tolerance rate of data processing is improved.

The context correlation type refers to that some data in the data file have a correlation, for example, the data file has business data of multiple enterprises, and business data belonging to the same enterprise have a correlation, each data has a corresponding primary key in the data file, and the primary key values of the business data of the same enterprise are the same.

Specifically, the types of the data files are different, and the information contained in the corresponding relationship between the server and the data fragments is different. If the data type of the data file is a context-dependent type, the second server calculates the number of fragments according to the number of available first servers and the attribute of each first server, and then configures a plurality of fragment sequence numbers for the data file according to the number of fragments, wherein each fragment sequence number represents a data fragment to be determined. And finally, acquiring load information of each first server, and distributing at least one fragment serial number for each first server by adopting a server load balancing algorithm, so as to determine the corresponding relation between each first server and each fragment serial number and obtain the corresponding relation between the server and the data fragments.

In one possible embodiment, if the data type of the data file is a context-dependent type, the second server first calculates the number of fragments according to the number of available first servers, the number of second servers, the attribute of each first server, and the attribute of each second server, and then configures a plurality of fragment sequence numbers for the data file according to the number of fragments, where each fragment sequence number represents one data fragment to be determined. And finally, acquiring load information of each first server and load information of each second server, and distributing at least one fragment serial number for each first server and each second server by adopting a server load balancing algorithm, so as to determine the corresponding relation between each first server and each fragment serial number and the corresponding relation between each second server and each fragment serial number, and obtain the corresponding relation between the servers and the data fragments.

In one embodiment, if the data type of the data file is a context-dependent type, the server load balancing algorithm includes: a fixed thread pool is set for each server in advance, and one data fragment processing task is one task in a thread pool queue. Distributing the number of the fragment sequence numbers and the fragment sequence numbers to be processed by each server, and checking the remaining capacity of the thread pool queue of the server. And meanwhile, an unavailable server is detected, and if any server is down and unavailable, the fragment sequence number is distributed to the available server for processing. The less the remaining capacity of the thread pool queue, the less the assigned fragment sequence number. The more the remaining capacity of the thread pool queue is, the more the allocated fragment sequence number is.

In an embodiment, as shown in fig. 4, the acquiring, by a first server, a data file from a second server, and determining, from the data file, a target data fragment corresponding to the first server according to a correspondence between the server and the data fragment, includes:

step 402, the first server obtains a primary key value corresponding to each piece of data to be processed in the data file, and determines a fragment sequence number corresponding to each piece of data to be processed according to the primary key value corresponding to each piece of data to be processed and the number of fragments.

Specifically, the first server traverses the whole file after acquiring the data file, identifies a primary key value corresponding to each row of data in the data file, and normally, each row is a piece of data to be processed, the maximum value of the primary key value is not less than the number of fragments, and allocates at least one group of data to be processed with the same primary key value to each fragment sequence number, so as to allocate a corresponding fragment sequence number to each piece of data to be processed, and the number of fragment sequence numbers responsible for each server may be different.

In a possible implementation manner, the primary key value corresponding to each piece of data to be processed performs remainder processing on the fragment number to obtain a remainder result, and the fragment sequence number corresponding to the piece of data to be processed is determined according to the remainder result. For example, the number of the fragments is n, the value range of the fragment sequence number i is 0 to n-1, the data file is traversed line by line starting from the first row, whether each piece of data to be processed belongs to the ith fragment is judged according to the primary key value m of each piece of data to be processed, and when m% n is equal to i (m is divided by n to obtain the remainder), the record belongs to the ith fragment; assuming that the number n of the fragments is 5, the value of the fragment sequence number i is 0,1,2,3,4, and there is at least one primary key value m corresponding to the to-be-processed data being 8, then the fragment sequence number corresponding to the part of the to-be-processed data is 8% 5 being 3, and each to-be-processed data having a primary key value of 8 belongs to the data fragment having a fragment sequence number of 3.

Step 404, the first server obtains a fragment serial number corresponding to the first server according to a correspondence between the server and the data fragment, determines each piece of to-be-processed data corresponding to the first server according to the fragment serial number corresponding to the first server and the fragment serial number corresponding to each piece of to-be-processed data, and obtains a target data fragment corresponding to the first server according to each piece of to-be-processed data corresponding to the first server.

Specifically, each first server determines a fragment serial number in charge of the first server according to a corresponding relationship between the server and the data fragments, identifies a fragment serial number corresponding to each piece of data to be processed in the data file, and if the fragment serial number corresponding to one piece of data to be processed is the same as the fragment serial number corresponding to the current first server, the piece of data to be processed is the piece of data to be processed corresponding to the current first server, and obtains a target data fragment corresponding to the current first server according to all pieces of data to be processed corresponding to the current first server.

In this embodiment, a primary key value corresponding to each piece of data to be processed in a data file is obtained through a first server, and a fragment sequence number corresponding to each piece of data to be processed is determined according to the primary key value corresponding to each piece of data to be processed and the number of fragments; the first server obtains a fragment serial number corresponding to the first server according to the corresponding relation between the server and the data fragments, determines each piece of data to be processed corresponding to the first server according to the fragment serial number corresponding to the first server and the fragment serial number corresponding to each piece of data to be processed, and obtains a target data fragment corresponding to the first server according to each piece of data to be processed corresponding to the first server. Each server can only process data in the target data fragment range which is in charge of the server, the data processing efficiency is improved, and when data processing has errors, only the fragment serial number with the errors needs to be determined, so that the corresponding server is determined, the data of the data fragment is reprocessed, and the fault tolerance rate of the data processing is also improved.

In one embodiment, the processing of the target data fragment by the first server includes: the first server processes each piece of data to be processed in the target data fragment to generate corresponding first information; determining a primary key value corresponding to each piece of first information according to the primary key value corresponding to each piece of data to be processed; and integrating the first information with the same primary key value to generate second information.

Specifically, if the data type of the data file is a context-irrelevant type, each first server processes each piece of to-be-processed data in the corresponding target data fragment, and generates a piece of corresponding first information for each piece of to-be-processed data.

Further, if the data type of the data file is a context-dependent type, each first server processes each piece of to-be-processed data in the corresponding target data fragment, and generates a piece of corresponding first information for each piece of to-be-processed data; each first server identifies a primary key value corresponding to each piece of data to be processed in the target data fragment, thereby determining the primary key value corresponding to each piece of first information, and integrates the first information with the same primary key value to generate second information, wherein the number of the second information is the same as the number of primary keys (the primary keys with the same key value are one type).

In this embodiment, processing the target data fragment by the first server includes: the first server processes each piece of data to be processed in the target data fragment to generate corresponding first information; determining a primary key value corresponding to each piece of first information according to the primary key value corresponding to each piece of data to be processed; and integrating the first information with the same primary key value to generate second information. The first information can be generated according to each piece of data to be processed, and the second information can be obtained by integrating all the associated first information according to the associated information of the data to be processed, so that the efficiency of subsequent data processing is improved.

In one possible embodiment, a multi-server data processing method includes:

if the data type of the data file is the context-irrelevant type, the second server acquires the data file and calculates the total data line number of the data file; the second server calculates the number of fragments according to the number of the first servers and the attribute of each first server, and calculates the number of starting lines and the number of ending lines of each data fragment according to the number of fragments and the number of total data lines; and the second server obtains a plurality of data fragments of the data file according to the starting line number and the ending line number of each data fragment.

The second server divides the data file into data fragments with the number of fragments, determines the corresponding relation between each first server and each data fragment according to the load information of each first server to obtain the corresponding relation between the server and the data fragments, or determines the corresponding relation between each first server and each data fragment according to the load information of each first server, and determines the corresponding relation between each second server and each data fragment according to the load information of each second server to obtain the corresponding relation between the server and the data fragments.

The first server acquires the corresponding relation between the server and the data fragments. And the first server acquires the data file from the second server and determines the target data fragment corresponding to the first server from the data file according to the corresponding relation between the server and the data fragment. The first server processes each piece of data to be processed in the target data fragment to generate corresponding first information.

In one possible embodiment, a multi-server data processing method includes:

if the data type of the data file is a context-dependent type, the second server obtains a plurality of fragment sequence numbers according to the number of the fragments, determines the corresponding relation between each first server and each fragment sequence number according to the load information of each first server to obtain the corresponding relation between the server and the data fragments, or determines the corresponding relation between each first server and each fragment sequence number according to the load information of each first server, and determines the corresponding relation between each second server and each fragment sequence number according to the load information of each second server to obtain the corresponding relation between the server and the data fragments.

The first server acquires the corresponding relation between the server and the data fragments. The first server acquires a data file from the second server, acquires a primary key value corresponding to each piece of data to be processed in the data file, and determines a fragment sequence number corresponding to each piece of data to be processed according to the primary key value corresponding to each piece of data to be processed and the number of fragments; the first server obtains a fragment serial number corresponding to the first server according to the corresponding relation between the server and the data fragments, determines each piece of data to be processed corresponding to the first server according to the fragment serial number corresponding to the first server and the fragment serial number corresponding to each piece of data to be processed, and obtains a target data fragment corresponding to the first server according to each piece of data to be processed corresponding to the first server. The first server processes each piece of data to be processed in the target data fragment to generate corresponding first information; determining a primary key value corresponding to each piece of first information according to the primary key value corresponding to each piece of data to be processed; and integrating the first information with the same primary key value to generate second information.

In one embodiment, a multi-server data processing method, for example, applied to a server to generate a bill according to a transaction detail file, includes:

the transaction detail file is in the shared directory. The files in the shared directory can be accessed by programs on multiple servers at the same time. Selecting a main server from a plurality of servers, selecting different fragmentation methods by the main server according to the types of the transaction detail files, dividing the transaction detail files into a plurality of fragments, determining the corresponding relation between each server and the fragments according to a server load balancing algorithm, and recording the corresponding relation into a remote database. This remote database is accessible to all servers. The program on each server firstly accesses the remote database to obtain the corresponding relation between each server and the fragments, then reads the transaction detail file, and only processes the data set belonging to the fragment to generate the bill. After the program generates the bill, each server sends the bill generated on its own server to the client through various channels, such as e-mail and file transmission. The transaction detail file is equivalent to a data file, the main server is equivalent to a second server, each server is equivalent to a first server, and the fragments are equivalent to data fragments.

The transaction detail file type can be divided into two types of context-independent type and context-dependent type. The main server selects different fragmentation methods for different transaction detail file types.

The main server is responsible for distributing the fragments which need to be processed by each server, wherein the main server electing method comprises the following steps: and checking the residual capacity of the thread pool queue of each server, and selecting a server which is the most idle as a main server, namely selecting a server with the maximum residual capacity of the thread pool queue as the main server.

When the main server distributes the fragments corresponding to each server, the required server load balancing algorithm comprises the following steps: each server is provided with a fixed thread pool, and one fragment processing task is one task in a thread pool queue. The number of fragments to be processed by each server is allocated, and the remaining capacity of the thread pool queue of the server needs to be checked. Meanwhile, an unavailable server is detected, and if a server is down and unavailable, the fragment is distributed to the available server for processing. The less the remaining capacity of the thread pool queue, the less the number of slices allocated. The more thread pool queue remaining capacity, the more slices are allocated. Setting m shards and n servers in total, wherein the residual capacity of the thread pool queue of the ith server is n_iAnd the sum of the remaining tasks of the thread pool queues of all the servers is s. The number of tasks per server shard is m (n)_i/s)。

For context-independent transaction detail files, a bill is generated for each row of transaction detail of the transaction detail file. According to the number of cores of the CPU of the server, the number of the fragments is preset, for example, k servers, if the CPU of each server is 4 cores, the total number of the fragments is set to be 4 x k, ideally, each server processes 4 fragments simultaneously, and each core of the CPU can process one fragment. And traversing the transaction detail file line by line, calculating the line number of the file, and dividing the line number by the number of the fragments to obtain the initial line number and the final line number of each fragment. For example, the number of file lines is h, the number of slices is n, the number of start lines of the ith slice (i is counted from 0, i is not less than 0 and not more than n-1) is (h/n) i +1, when i is less than n-1, the number of end lines of the ith slice is (h/n) i +1, and when i is n-1, the number of end lines of the ith slice is h. And recording the starting line number and the ending line number of each server and the corresponding fragment to be processed into a database table according to the load balance of each server at present. A server with a large load may process fewer shards and a server with a small load may process more shards. When the program on each server runs, the data in the table is read first, and the starting line number and the ending line number of the to-be-processed fragments are obtained. And traversing the transaction detail file from the first line to the last line, only processing the data in the range of the fragment, and directly skipping the data outside the range without any processing. Each server sends the bill generated on its own server to the client. The generated bill corresponds to the first information.

For a context-dependent transaction detail file, a bill is generated for multiple rows of consecutive transaction details. Taking the merchant statement as an example, multiple rows of records of transaction details of the same merchant are displayed in the statement, and statistics of transactions, such as total amount of transactions and total number of transactions, are displayed. In the merchant statement, each statement is all transaction details of a merchant, each line of transaction details comprises a merchant number, and the main key m of the transaction details corresponds to the merchant number. Transaction details for the same merchant number should appear in one merchant statement. According to the number of the cores of the CPUs of the servers, the number of the fragments is set in advance, for example, k servers, if the number of the CPUs of each server is 4, the total number of the fragments is set to be 4 x k, ideally, each server processes 4 fragments concurrently, and each core of the CPU can process one fragment. And recording each server and the corresponding fragment sequence number to be processed into a database according to the load balance of the current server. A server with a large load may process fewer shards and a server with a small load may process more shards. When the program on each server runs, the program firstly acquires the fragment serial number i required to be processed from the database. The total number of the fragments is set to be n, and the value range of the fragment serial number i is set to be 0 to n-1. And traversing the transaction detail file line by line from the first line, and judging whether the transaction record of the line belongs to the ith fragment according to the statement main key m. For example, a merchant is billed, and one piece of the bill shows all transaction record information of one merchant number, the value of the primary key m may be the merchant number, and when m% n is i, the record belongs to the ith segment (for example, the total number n of the segments is 5, and the value of the segment number i is 0,1,2,3,4, then the segment corresponding to the merchant number 8 is 8% 5 is 3, and the merchant belongs to the segment with the sequence number 3). When the program on each server traverses the transaction detail file from the first row, whether the record of the row belongs to the fragment which needs to be processed is judged, and meanwhile, the information of the main key is required to be recorded, and the transaction records with the same main key are gathered into a bill. Because the fragments are obtained according to the value of the main key m, the main keys m (such as merchant numbers) are the same and are necessarily in the same fragment, so that each server can obtain all data of the same statement bill and can send the statement bill after generating the statement bill. Wherein the generated bill corresponds to the second information.

It should be understood that, although the steps in the flowcharts related to the embodiments as described above are sequentially displayed as indicated by arrows, the steps are not necessarily performed sequentially as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a part of the steps in the flowcharts related to the embodiments described above may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the execution order of the steps or stages is not necessarily sequential, but may be rotated or alternated with other steps or at least a part of the steps or stages in other steps.

Based on the same inventive concept, the embodiment of the present application further provides a multi-server data processing apparatus for implementing the above-mentioned multi-server data processing method. The implementation scheme for solving the problem provided by the device is similar to the implementation scheme described in the above method, so specific limitations in one or more embodiments of the multi-server data processing device provided below can refer to the limitations in the above multi-server data processing method, and details are not described here.

In one embodiment, as shown in FIG. 5, there is provided a multi-server data processing apparatus 500 comprising: a relationship obtaining module 501, a data confirming module 502 and a data processing module 503, wherein:

a relationship obtaining module 501, configured to obtain, by a first server, a corresponding relationship between a server and data fragments, where the corresponding relationship between the server and the data fragments is determined based on data files and the number of fragments;

a data confirmation module 502, configured to obtain, by the first server, the data file from the second server, and determine, from the data file, a target data fragment corresponding to the first server according to a correspondence between the server and the data fragment;

the data processing module 503 is configured to process the target data fragment by the first server.

In one embodiment, the apparatus further comprises a relationship building module 504:

a relationship building module 504, configured to, if the data type of the data file is a context-irrelevant type, divide the data file into data fragments of a number of fragments by the second server, determine a correspondence between each first server and each data fragment according to load information of each first server, to obtain a correspondence between the server and the data fragment, or determine a correspondence between each first server and each data fragment according to load information of each first server, and determine a correspondence between each second server and each data fragment according to load information of each second server, to obtain a correspondence between the server and the data fragment.

In one embodiment, the relationship building module 504 is further configured to obtain the data file by the second server, and calculate a total data line number of the data file; the second server calculates the number of fragments according to the number of the first servers and the attribute of each first server, and calculates the number of starting lines and the number of ending lines of each data fragment according to the number of fragments and the number of total data lines; and the second server obtains a plurality of data fragments of the data file according to the starting line number and the ending line number of each data fragment.

a relationship building module 504, configured to, if the data type of the data file is a context-dependent type, obtain, by the second server, a plurality of fragment sequence numbers according to the number of fragments, determine, according to the load information of each first server, a correspondence between each first server and each fragment sequence number, obtain a correspondence between the server and the data fragment, or determine, according to the load information of each first server, a correspondence between each first server and each fragment sequence number, determine, according to the load information of each second server, a correspondence between each second server and each fragment sequence number, and obtain a correspondence between the server and the data fragment.

In an embodiment, the data confirmation module 502 is further configured to obtain, by the first server, a primary key value corresponding to each piece of to-be-processed data in the data file, and determine, according to the primary key value corresponding to each piece of to-be-processed data and the number of fragments, a fragment sequence number corresponding to each piece of to-be-processed data; the first server obtains a fragment serial number corresponding to the first server according to the corresponding relation between the server and the data fragments, determines each piece of data to be processed corresponding to the first server according to the fragment serial number corresponding to the first server and the fragment serial number corresponding to each piece of data to be processed, and obtains a target data fragment corresponding to the first server according to each piece of data to be processed corresponding to the first server.

In an embodiment, the data processing module 503 is further configured to process, by the first server, each piece of data to be processed in the target data fragment, and generate corresponding first information; determining a primary key value corresponding to each piece of first information according to the primary key value corresponding to each piece of data to be processed; and integrating the first information with the same primary key value to generate second information.

The various modules in the multi-server data processing apparatus described above may be implemented in whole or in part by software, hardware, and combinations thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a terminal, and its internal structure diagram may be as shown in fig. 6. The computer apparatus includes a processor, a memory, an input/output interface, a communication interface, a display unit, and an input device. The processor, the memory and the input/output interface are connected by a system bus, and the communication interface, the display unit and the input device are connected by the input/output interface to the system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The input/output interface of the computer device is used for exchanging information between the processor and an external device. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless communication can be realized through WIFI, a mobile cellular network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement a multi-server data processing method. The display unit of the computer equipment is used for forming a visual and visible picture, and can be a display screen, a projection device or a virtual reality imaging device, the display screen can be a liquid crystal display screen or an electronic ink display screen, the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.

Those skilled in the art will appreciate that the architecture shown in fig. 6 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having a computer program stored therein, the processor implementing the following steps when executing the computer program:

and the first server processes the target data fragment.

In one embodiment, the processor, when executing the computer program, further performs the steps of:

the first server obtains a fragment serial number corresponding to the first server according to the corresponding relation between the servers and the data fragments, determines each piece of to-be-processed data corresponding to the first server according to the fragment serial number corresponding to the first server and the fragment serial number corresponding to each piece of to-be-processed data, and obtains a target data fragment corresponding to the first server according to each piece of to-be-processed data corresponding to the first server.

In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of:

and the first server processes the target data fragment.

In one embodiment, the computer program when executed by the processor further performs the steps of:

In one embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, performs the steps of:

and the first server processes the target data fragment.

It should be noted that, the user information (including but not limited to user equipment information, user personal information, etc.) and data (including but not limited to data for analysis, stored data, displayed data, etc.) referred to in the present application are information and data authorized by the user or sufficiently authorized by each party, and the collection, use and processing of the related data need to comply with the relevant laws and regulations and standards of the relevant country and region.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above may be implemented by hardware instructions of a computer program, which may be stored in a non-volatile computer-readable storage medium, and when executed, may include the processes of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high-density embedded nonvolatile Memory, resistive Random Access Memory (ReRAM), Magnetic Random Access Memory (MRAM), Ferroelectric Random Access Memory (FRAM), Phase Change Memory (PCM), graphene Memory, and the like. Volatile Memory can include Random Access Memory (RAM), external cache Memory, and the like. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others. The databases referred to in various embodiments provided herein may include at least one of relational and non-relational databases. The non-relational database may include, but is not limited to, a block chain based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic devices, quantum computing based data processing logic devices, etc., without limitation.

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present application shall be subject to the appended claims.

Claims

1. A method for multi-server data processing, the method comprising:

the method comprises the steps that a first server obtains a corresponding relation between a server and data fragments, and the corresponding relation between the server and the data fragments is determined based on data files and the number of fragments;

the first server acquires the data file from the second server, and determines a target data fragment corresponding to the first server from the data file according to the corresponding relation between the server and the data fragment;

and the first server processes the target data fragment.

2. The method of claim 1, wherein the correspondence between the server and the data shards is determined based on the number of data files and shards, and comprises:

if the data type of the data file is a context-irrelevant type, the second server divides the data file into a plurality of data fragments with the number of fragments, determines the corresponding relation between each first server and each data fragment according to the load information of each first server to obtain the corresponding relation between the server and the data fragment, or determines the corresponding relation between each first server and each data fragment according to the load information of each first server, and determines the corresponding relation between each second server and each data fragment according to the load information of each second server to obtain the corresponding relation between the server and the data fragment.

3. The method of claim 2, wherein the dividing the data file into a number of data slices comprises:

the second server calculates the number of fragments according to the number of the first servers and the attribute of each first server, and calculates the number of starting lines and the number of ending lines of each data fragment according to the number of fragments and the total data lines;

4. The method of claim 1, wherein the correspondence of the server to the data shards is determined based on the number of data files and shards, and comprises:

if the data type of the data file is a context-dependent type, the second server obtains a plurality of fragment serial numbers according to the number of fragments, determines the corresponding relationship between each first server and each fragment serial number according to the load information of each first server, obtains the corresponding relationship between the server and the data fragments, or determines the corresponding relationship between each first server and each fragment serial number according to the load information of each first server, and determines the corresponding relationship between each second server and each fragment serial number according to the load information of each second server, and obtains the corresponding relationship between the server and the data fragments.

5. The method of claim 4, wherein the first server obtains the data file from the second server, and determines the target data segment corresponding to the first server from the data file according to the correspondence between the server and the data segment, and the method comprises:

the first server obtains a fragment serial number corresponding to the first server according to the corresponding relation between the server and the data fragment, determines each piece of data to be processed corresponding to the first server according to the fragment serial number corresponding to the first server and the fragment serial number corresponding to each piece of data to be processed, and obtains the target data fragment corresponding to the first server according to each piece of data to be processed corresponding to the first server.

6. The method of claim 5, wherein the processing of the target data segment by the first server comprises:

7. A multi-server data processing apparatus, characterized in that the apparatus comprises:

the system comprises a relation acquisition module, a first server and a second server, wherein the relation acquisition module is used for acquiring the corresponding relation between a server and data fragments, and the corresponding relation between the server and the data fragments is determined based on data files and the number of the fragments;

the data confirmation module is used for acquiring the data file from the second server by the first server and determining a target data fragment corresponding to the first server from the data file according to the corresponding relation between the server and the data fragment;

8. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 6.

9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 6.

10. A computer program product comprising a computer program, characterized in that the computer program realizes the steps of the method of any one of claims 1 to 6 when executed by a processor.