CN115250276A - Distributed system and data processing method and device - Google Patents

Distributed system and data processing method and device Download PDF

Info

Publication number
CN115250276A
CN115250276A CN202110456873.3A CN202110456873A CN115250276A CN 115250276 A CN115250276 A CN 115250276A CN 202110456873 A CN202110456873 A CN 202110456873A CN 115250276 A CN115250276 A CN 115250276A
Authority
CN
China
Prior art keywords
data processing
data
cluster
distributed system
subtask
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110456873.3A
Other languages
Chinese (zh)
Inventor
佘志典
孙健
陈思华
齐晓磊
王建兴
黄树林
王林
徐中礼
宋鹏程
赵珠慧
刘佩金
戴伟
邹林祚
张利平
万自强
吴锦伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Tongbang Zhuoyi Technology Co ltd
Original Assignee
Beijing Tongbang Zhuoyi Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Tongbang Zhuoyi Technology Co ltd filed Critical Beijing Tongbang Zhuoyi Technology Co ltd
Priority to CN202110456873.3A priority Critical patent/CN115250276A/en
Publication of CN115250276A publication Critical patent/CN115250276A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1004Server selection for load balancing
    • H04L67/1008Server selection for load balancing based on parameters of servers, e.g. available memory or workload
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1031Controlling of the operation of servers by a load balancer, e.g. adding or removing servers that serve requests

Abstract

Embodiments of the present disclosure disclose distributed systems. One embodiment of the distributed system comprises: a distributor cluster, an aggregator cluster, and an operations cluster, the distributor cluster configured to: receiving a data processing task; acquiring target operation data required by a data processing task; sending the data processing task and the target operation data to an aggregator cluster; the aggregator cluster is configured to: receiving a data processing task and target operation data; decomposing a data processing task into a plurality of subtasks, and extracting an operation data subset corresponding to each subtask from target operation data; distributing each subtask and the corresponding operation data subset to operation nodes in the operation cluster; receiving a subtask processing result; aggregating the processing results of all the subtasks into data processing results; sending the data processing result to the distributor cluster; the operation cluster is configured to: executing the subtasks and generating a subtask processing result; and sending the processing result of each subtask to the aggregator.

Description

Distributed system and data processing method and device
Technical Field
The embodiment of the disclosure relates to the technical field of computers, in particular to the field of data processing, and more particularly to a distributed system and a data processing method and device.
Background
In the field of data processing, for some data processing tasks with large data volume, more data types and higher computational complexity, such as financial data calculation tasks, a distributed system is usually adopted to process the complex data so as to improve the efficiency of data processing.
In the related art, when processing data, a distributed system usually sends a data processing task to multiple computers with the same application to simultaneously call up the processing capability of the multiple computers.
Disclosure of Invention
The embodiment of the disclosure provides a distributed system and a data processing method and device.
In a first aspect, an embodiment of the present disclosure provides a distributed system, including: a distributor cluster, an aggregator cluster, and an operations cluster, wherein the distributor cluster is configured to: receiving a data processing task; acquiring target operation data required by a data processing task; sending the data processing task and the target operation data to an aggregator cluster; the aggregator cluster is configured to: receiving a data processing task and target operation data; decomposing a data processing task into a plurality of subtasks, and extracting an operation data subset corresponding to each subtask from target operation data; distributing each subtask and the corresponding operation data subset to at least one operation node in the operation cluster; receiving a subtask processing result returned by at least one operation node; in response to receiving the subtask processing results of all the subtasks within a first preset time, aggregating the subtask processing results into a data processing result; and sending the data processing result to the distributor cluster; the operation cluster is configured to: receiving subtasks distributed by an aggregator and operation data subsets thereof; executing the subtasks and generating subtask processing results of the subtasks; and sending the processing result of each subtask to the aggregator.
In some embodiments, each distribution node in the distributor cluster, each aggregation node in the aggregator cluster, and each compute node in the compute cluster all employ a coroutine computation strategy.
In some embodiments, the distributed system further comprises a cache cluster configured to store the operational data; and, the distributor cluster is further configured to: determining a target operation data type corresponding to the data processing task based on a pre-established corresponding relation between the data processing task and the operation data type; and acquiring the operation data pointed by the target operation data type from the cache cluster to obtain the target operation data.
In some embodiments, the aggregator cluster is further configured to: if the subtask processing results of all the subtasks are not received within the first preset time, sending the subtasks and the corresponding operation data subsets to the operation nodes which do not return the subtask processing results again; and if the times of sending the subtasks and the corresponding operation data subsets reach a first preset number and the subtask processing results of the subtasks are not received yet, determining the data processing results as task overtime and generating alarm information.
In some embodiments, the distributor cluster is further configured to: if the data processing result is not received within the second preset time, sending the data processing task and the target operation data to the aggregator cluster again; and if the times of sending the data processing tasks and the target operation data reach a second preset number and the data processing result is not received yet, determining the data processing result as task overtime and generating alarm information.
In some embodiments, the distributed system further includes a reserved network address and a standby server configured with the system file in advance, and when the utilization rate of the operation cluster reaches a preset threshold, the network address of the standby server is updated to the reserved network address, so that the standby server is accessed to the operation cluster as a new operation node.
In a second aspect, an embodiment of the present disclosure provides a method for data processing, where the method includes: receiving a data processing instruction, wherein the data type requested to be processed by the data processing instruction is financial data; sending a data processing instruction to the distributed system in any embodiment, and receiving a data processing result returned by the distributed system; and sending a data processing result.
In some embodiments, the method further comprises: if the data processing result returned by the distributed system is not received within the third preset time, sending a data processing instruction to the distributed system again; and if the times of sending the data processing instructions reach a third preset time and a data processing result returned by the distributed system is not received, sending prompt information of overtime task.
In a third aspect, an embodiment of the present disclosure provides an apparatus for data processing, including: the instruction receiving unit is configured to receive a data processing instruction, and the data type requested to be processed by the data processing instruction is financial data; the instruction sending unit is configured to send a data processing instruction to the distributed system in any one of the embodiments and receive a data processing result returned by the distributed system; a result transmitting unit configured to transmit the data processing result.
In some embodiments, the apparatus further comprises: the instruction retransmission unit is configured to send the data processing instruction to the distributed system again if the data processing result returned by the distributed system is not received within a third preset time period; and the information sending unit is configured to send prompt information of task overtime if the number of times of sending the data processing instruction reaches a preset threshold and a data processing result returned by the distributed system is not received.
In a fourth aspect, an embodiment of the present disclosure provides an electronic device, including: one or more processors; a storage device having one or more programs stored thereon, which when executed by one or more processors, cause the one or more processors to implement a method of data processing as in any one of the above embodiments.
In a fifth aspect, embodiments of the present disclosure provide a computer readable medium on which a computer program is stored, wherein the program, when executed by a processor, implements a method of data processing as in any of the above embodiments.
The distributed system provided by the embodiment of the disclosure receives a data processing task through a distributor cluster, acquires target operation data required by the data processing task, and then sends the data processing task and the target operation data to an aggregator; the aggregator decomposes the data processing task and the target operation data into a plurality of subtasks and operation data subsets corresponding to the subtasks, distributes the subtasks and the operation data subsets corresponding to the subtasks to each operation node in the operation cluster, receives subtask processing data obtained by each operation node executing the corresponding subtasks, aggregates all subtask processing data into a data processing result, and returns the data processing result to the distributor. A complex data processing task can be decomposed into a plurality of subtasks, each computing node executes each subtask respectively, and the subtask processing results are aggregated into data processing results, so that the data processing efficiency can be improved.
According to the data processing method and device provided by the embodiment of the disclosure, the financial data processing tasks with large data volume, multiple data types and high operation complexity can be decomposed into a plurality of subtasks through the distributed system, and then the processing results of the subtasks are aggregated into the data processing results, so that the efficiency of financial data processing can be improved.
Drawings
Other features, objects and advantages of the present disclosure will become more apparent upon reading of the detailed description of non-limiting embodiments made with reference to the following drawings:
FIG. 1 is an architectural diagram of one embodiment of a distributed system according to the present disclosure;
FIG. 2 is an exemplary system architecture diagram to which some embodiments of the data processing method of the present disclosure may be applied;
FIG. 3 is a flow diagram for one embodiment of a method of data processing according to the present disclosure;
FIG. 4 is a schematic diagram of an application scenario of the method of data processing shown in FIG. 3;
FIG. 5 is a block diagram of one embodiment of an apparatus for data processing according to the present disclosure;
fig. 6 is a schematic structural diagram of an electronic device suitable for implementing an embodiment of the data processing method of the present disclosure.
Detailed Description
The present disclosure is described in further detail below with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
It should be noted that, in the present disclosure, the embodiments and features of the embodiments may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
Fig. 1 shows an architectural schematic of one embodiment of a distributed system of the present disclosure. As shown in fig. 1, the distributed system 100 of the present disclosure includes: a distributor cluster 101, an aggregator cluster 102, and a compute cluster 103, wherein the distributor cluster is configured to: receiving a data processing task; acquiring target operation data required by a data processing task; sending the data processing task and the target operation data to an aggregator cluster; the aggregator cluster is configured to: receiving a data processing task and target operation data; decomposing a data processing task into a plurality of subtasks, and extracting an operation data subset corresponding to each subtask from target operation data; distributing each subtask and the corresponding operation data subset to at least one operation node in the operation cluster; receiving a subtask processing result returned by at least one operation node; in response to receiving the subtask processing results of all the subtasks within a first preset time, aggregating the subtask processing results into a data processing result; and sending the data processing result to the distributor cluster; the operation cluster is configured to: receiving subtasks distributed by an aggregator and operation data subsets thereof; executing the subtasks and generating subtask processing results of the subtasks; and sending the processing result of each subtask to the aggregator.
In this embodiment, each cluster in the distributed system may be a set of virtual machines hosted on a physical machine and having the same application, or a set of physical machines, or a combination of a physical machine and a virtual machine, which is not limited in this application.
In this embodiment, a dispatcher cluster is configured to receive a data processing task sent by a caller, and then call target operation data required by the data processing task according to the type of the data processing task, for example, the target operation data may be called from a local storage of each computer in the distributed system.
In some optional implementations of this embodiment, the distributed system further includes a cache cluster configured to store the operation data; and, the distributor cluster is further configured to: determining a target operation data type corresponding to the data processing task based on a pre-constructed corresponding relationship between the data processing task and the operation data type; and acquiring the operation data pointed by the target operation data type from the cache cluster to obtain the target operation data.
By way of example, memcached or Redis may be adopted to set a cache cluster in the distributed system, where the cache cluster is isolated from applications of the distributed system, and the cache cluster stores operation data required by various types of data processing tasks, so that the storage pressure of the distributed system may be reduced. After the distributor receives the data processing task of the calling party, the type of the target operation data required by the data processing task can be determined according to the type of the data processing task, and then the target operation data is called from the cache cluster, so that the operation speed can be improved.
In this embodiment, after receiving the data processing task, the aggregator (aggregator) cluster parses the data processing task, decomposes the data processing task into a plurality of relatively independent subtasks, extracts an operation data subset required by each subtask from the target operation data, and then may distribute the subtasks and the operation data subset thereof to each operation node in the operation cluster, for example, the aggregator may first obtain a load state of the operation node, and select one or more operation nodes with a low idle or low load from the load state, and then distribute each subtask and the operation data subset thereof to the selected one or more operation nodes. And the operation nodes respectively execute the subtasks to generate a subtask processing result, and then return the subtask processing result to the aggregator cluster. And when the aggregator cluster receives all the subtask processing results within the first preset time, aggregating the subtask processing results into a data processing result, and returning the data processing result to the distributor.
The distributed system provided by the embodiment of the disclosure receives a data processing task through a distributor cluster, acquires target operation data required by the data processing task, and then sends the data processing task and the target operation data to an aggregator; the aggregator decomposes the data processing task and the target operation data into a plurality of subtasks and operation data subsets corresponding to the subtasks, distributes the subtasks and the operation data subsets corresponding to the subtasks to each operation node in the operation cluster, receives subtask processing data obtained by each operation node executing the corresponding subtask, aggregates all subtask processing data into a data processing result, and returns the data processing result to the distributor. A complex data processing task can be decomposed into a plurality of subtasks, each computing node executes each subtask respectively, and the subtask processing results are aggregated into data processing results, so that the data processing efficiency can be improved.
In some optional implementation manners of this embodiment, each distribution node in the distributor cluster, each aggregation node in the aggregator cluster, and each computation node in the computation cluster all use a Coroutine (Coroutine) computation policy.
In the related art, the minimum computing unit of each node in the distributed system is a thread, and when the data volume is large or the parallel processing task volume is large, data blocking is easily caused. In order to avoid this, each node in the distributed system in this implementation mode adopts a coroutine operation strategy, and takes a coroutine as a minimum computing unit. Coroutines are a non-priority subroutine scheduling component that allows a subroutine to suspend resuming in a specific place. Coroutines are smaller computational units than threads, and one thread can comprise a plurality of coroutines, but only one coroutine is in a running state at a certain time, and other coroutines share the computer resources distributed by the thread and are in a suspended state until the next Yoeld instruction is reached.
Each node in the implementation mode adopts a co-program operation strategy, so that the data processing efficiency can be further improved, and the data blockage can be effectively reduced.
In practice, each cluster in the distributed system and each node in the same cluster perform data interaction through the network, and when the network fails or other faults exist, data interaction may fail.
For example, if the operation node fails to receive the subtask and the operation data subset thereof from the aggregator cluster, or if the operation node fails to return the subtask processing result to the aggregator cluster after the operation node completes the subtask, the aggregator cluster may not receive all the subtask processing results within the first preset time period, and thus the subsequent aggregation step may not be performed.
In some optional implementations of this embodiment, the aggregator cluster is further configured to: if the subtask processing results of all the subtasks are not received within the first preset time, sending the subtasks and the corresponding operation data subsets to the operation nodes which do not return the subtask processing results again; and if the times of sending the subtasks and the corresponding operation data subsets reach a second preset number and the subtask processing results of the subtasks are not received yet, determining the data processing results as task overtime and generating alarm information.
In this implementation manner, after the aggregator distributes the subtasks and the operation data subsets to the operation cluster, if the aggregation cluster fails to receive all the subtask processing data after a first preset time, the aggregator cluster may send the subtasks and the operation data subsets thereof to the operation node again to instruct the operation node to execute the subtasks again. And if the operation node does not return the subtask processing result, the operation node can send the subtask processing result again. When the sending times reach a first preset time, the fault exists in the distributed system or the problem needing to be solved by manual intervention exists in the data calculation task, at this time, the aggregator cluster can determine the data processing result as task overtime and send alarm information to a monitoring module of the distributed system, and the alarm information can be presented in a configuration page of the distributed system, so that an operator is reminded to perform investigation.
Through retransmission, on one hand, data interaction interruption caused by burst faults between the aggregator cluster and the operation cluster can be avoided, on the other hand, faults needing manual intervention can be identified through retransmission times, operators are prompted timely through alarm information, and reliability of the distributed system can be improved.
For another example, if the distributor cluster fails to send the data processing task and the target operation data to the aggregator cluster, or fails to return the data processing result to the distributor cluster after the aggregator cluster obtains the data processing result, the distributor cluster may not receive the data processing result within the preset time.
In some embodiments, the distributor cluster is further configured to: if the data processing result is not received within the second preset time, sending the data processing task and the target operation data to the aggregator cluster again; and if the times of sending the data processing tasks and the target operation data reach a second preset number and the data processing result is not received yet, determining the data processing result as task overtime and generating alarm information.
In this implementation, after the distributor cluster sends the data processing task and the target operation data to the aggregator cluster, if all the subtask processing data cannot be received after a second preset duration, the distributor cluster may send the data processing task and the target operation data to the aggregator cluster again until the data processing result is received or the number of times of retransmission reaches a second preset number of times. When the sending times reach a second preset time, the problem needing to be solved by manual intervention exists in the distributed system, for example, a fault exists in the distributed system or in the data calculation task, at this time, the distributor cluster can determine the data processing result as the task overtime, and send alarm information to a monitoring module of the distributed system, and the alarm information can be presented in a configuration page of the distributed system, so that an operator is reminded to perform troubleshooting.
Therefore, on one hand, data interaction interruption caused by burst faults between the distributor cluster and the aggregator cluster can be avoided, on the other hand, faults needing manual intervention can be identified through the retransmission times, operators are prompted timely through alarm information, and reliability of the distributed system can be improved.
As an example, the front-end page of the distributed system may record status information of each data processing task, and may include, for example, an ID (Identity document), a start time, an end time, an operating status, a number of calls, an average elapsed time, and the like of the data processing task, where the operating status may represent a current status of the data processing task, and if there is alarm information, a prompt message of "operating error" is presented in an operating status column.
In some optional implementation manners of this embodiment, the distributed system further includes a reserved network address and a standby server configured with system files in advance, and when the utilization rate of the operation cluster reaches a preset threshold, the network address of the standby server is updated to the reserved network address, so that the standby server is accessed to the operation cluster as a new operation node.
In the implementation mode, the reserved network address represents an access point of an operation cluster in the distributed system, and a standby server with a pre-configured system file can access the operation cluster through the reserved network address server, so that the number of operation nodes in the operation cluster is increased, the data processing capacity of the operation cluster is further improved, the current data processing task in the operation cluster does not need to be interrupted, and dynamic capacity expansion of the distributed system is realized.
Referring next to fig. 2, there is shown an exemplary system architecture 200 of an apparatus to which the method of data processing or data processing in embodiments of the present disclosure may be applied.
As shown in fig. 2, the system architecture 200 may include terminal devices 201, 202, 203, a network 204, and a server 205. The network 204 serves as a medium for providing communication links between the terminal devices 201, 202, 203 and the server 205. Network 204 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few. The server 205 may be a load balancing server, and may send a data processing instruction of the user to the back-end application server, and return a data processing result returned by the application server to the user; the data processing method may also be a server cluster in which the distributed system shown in fig. 1 is deployed, and performs data interaction with a user through a service port provided by the distributed system to send a data processing instruction of the user to a back-end operation cluster, and returns a data processing result obtained by the operation cluster to the user.
The user may use the terminal device 201, 202, 203 to interact with the server 205 via the network 204 to receive or send messages and the like, for example, data processing instructions of the user may be sent to the server, and data processing results may also be received from the server.
The terminal devices 201, 202, 203 may be hardware or software. When the terminal devices 201, 202, and 203 are hardware, they may be electronic devices with communication functions, including but not limited to smart phones, tablet computers, e-book readers, laptop portable computers, desktop computers, and the like. When the terminal devices 201, 202, 203 are software, they can be installed in the electronic devices listed above. It may be implemented, for example, as multiple software or software modules to provide distributed services, or as a single software or software module. And is not particularly limited herein.
The server 205 may be a server cluster deployed with the distributed system shown in fig. 1, and may provide various data processing services, such as processing data processing instructions sent by the terminal devices 201, 202, and 203, for example, solving financial data operation tasks sent by users. And returns the processing result to the terminal equipment.
It should be noted that the data processing apparatus provided by the embodiment of the present disclosure may be disposed in the server 205. The server may be hardware or software. When the server is hardware, it may be implemented as a distributed server cluster composed of multiple servers, or may be implemented as a single server. When the server is software, it may be implemented as multiple software or software modules, for example, to provide distributed services, or as a single software or software module. And is not particularly limited herein.
With continued reference to FIG. 3, a flow 300 of one embodiment of a method of data processing in accordance with the present disclosure is shown. The data processing method comprises the following steps:
step 301, receiving a data processing instruction.
In this embodiment, the type of data requested to be processed by the data processing instruction is financial data. Compared with other types of data, the financial data has higher complexity, such as large data volume, and sometimes millions of data are required for one calculation task; for another example, the data types are multiple, and not only the relational data of multiple table joins, but also the time sequence data and the like are used. This results in the following characteristics for the data processing task of the financial data: the operation complexity involved in the data processing process is high, for example, complicated mathematical and scientific calculation, regression, matrix calculation, secondary optimization and the like can be used, a machine learning algorithm and the like can also be used, and sometimes a calculation task needs to be executed for several hours or even days; the types of data involved in the process are complex and often require data from multiple data sources, which may include, for example, market data and transactional data.
As examples, data processing tasks for financial class data may include investment analysis, post investment attribution, asset allocation analysis, investment transaction execution, and the like. Taking asset allocation analysis as an example, the computational model involved in the data processing process comprises: a mean-variance model, a risk evaluation model, a B-L model, a fluctuation rate control model and the like. As another example, the data tasks performed by the investment transactions include the following: determining a transaction strategy: based on futures options arbitrage, cross-term and cross-market arbitrage, market hedging, index enhancement and the like; real-time wind control: risk assessment such as client admission, credit limit and the like, and industry/risk factor exposure and the like.
In the present embodiment, the execution subject may be, for example, the server 205 shown in fig. 2, and the user may transmit the data processing instruction to the server 205 through the terminal device.
Step 302, sending the data processing instruction to the distributed system, and receiving the data processing result returned by the distributed system.
In this embodiment, the distributed system is the distributed system in the embodiment shown in fig. 1. The execution subject (for example, a server responsible for message forwarding) sends the data processing instruction to the distributed system, and then the distributor cluster in the distributed system retrieves the relevant data according to the data processing task pointed by the data processing instruction, and sends the data processing instruction and the relevant data to the aggregator cluster. The aggregator cluster decomposes the data processing task pointed by the data processing instruction into a plurality of subtasks, extracts a data subset corresponding to each subtask from related data, then distributes each subtask and the corresponding data subset to each operation node in the operation cluster, and respectively executes each subtask by a plurality of operation nodes to obtain the processing result of each subtask. And then, each operation node returns the obtained processing result of each subtask to the aggregator cluster, the aggregator cluster aggregates the processing results of each subtask to obtain a data processing result, and the data processing result is returned to the distributor cluster. Finally, the distributor cluster sends the data processing result to the execution subject.
As an example, the executing entity may select a TCP long connection or REST (Representational State Transfer) interface to send the data processing request to the back-end distributed system according to a user requirement, and thus, may initiate a synchronous request or an asynchronous request for data processing to the distributed system.
Step 303, sending the data processing result.
In this embodiment, the data processing result refers to a result obtained after the distributed system executes the corresponding processing step according to the data processing instruction of the user, and may be, for example, processed data, a calculation result, or prompt information indicating that the task is overtime. A task timeout characterizes the inability or failure of the data processing instruction to be executed.
In some optional implementations of this embodiment, the method further includes: if the data processing result returned by the distributed system is not received within the third preset time, sending a data processing instruction to the distributed system again; and if the times of sending the data processing instructions reach a third preset time and a data processing result returned by the distributed system is not received, sending prompt information of overtime task. Therefore, when the data interaction between the execution main body and the back-end distributed system is interrupted for a short time, the data processing task can be prevented from failing through retransmission, and the data processing result is not received after the retransmission times reach the preset threshold value, so that the data interaction between the execution main body and the distributed system is represented to be in a disconnected state, and at the moment, prompt information of task overtime can be returned to a user.
In a specific example, the user needs to determine a combined risk of a combination, where the combination includes 500 bonds, the user may send an instruction requesting to calculate the combined risk to the server 205 through the terminal device, the server 205 sends the instruction to the distributed system at the back end, and the distributed system performs the following data processing steps: first, the distributor cluster extracts market data required for the calculation task, such as quotations, bond basic information and positions, according to the received performance attribution calculation request of 500 bond combinations, and then, the distributor cluster transmits the performance attribution calculation request and the extracted market data to the aggregator cluster. Then, the aggregator cluster divides the 500 computation tasks of the bonds sent by the distributor cluster into 500 subtasks, each subtask represents a computation task of one bond, and extracts a market data subset corresponding to the bond from the market data, then distributes the 500 subtasks and the market data subset thereof to the operation nodes in an idle state or with a low load rate, and after receiving the processing results of the 500 subtasks, aggregates the processing results of the 500 subtasks into an integral data processing result and returns the integral data processing result to the distributor. Then, the distributor returns the data processing result to the execution main body, and the execution main body forwards the data processing result to the user
With continued reference to fig. 4, fig. 4 is a schematic view of a scenario of the method shown in fig. 3. In the scenario 400 shown in fig. 4, the execution principal 402 may be a server-side load balancing server (e.g., may be an Nginx cluster), and the server cluster 403 may be a server cluster deployed with a distributed system. The user 401 may send the processing request of the financial data to the execution main body through the terminal device, the execution main body sends the data processing request to the distributed system according to a preset load balancing policy, and the distributed system sequentially executes the following steps based on the data processing request: data is called, task decomposition, subtask execution, result aggregation and result return are carried out. The execution agent may then return the data processing results to the user.
According to the data processing method and device provided by the embodiment of the disclosure, the financial data processing tasks with large data volume, multiple data types and high operation complexity can be decomposed into a plurality of subtasks through the distributed system, and then the processing results of the subtasks are aggregated into the data processing results, so that the efficiency of financial data processing can be improved.
With further reference to fig. 5, as an implementation of the methods shown in the above figures, the present disclosure provides an embodiment of an apparatus for data processing, which corresponds to the method embodiment shown in fig. 2, and which is particularly applicable to various electronic devices.
As shown in fig. 5, the data processing apparatus 500 of the present embodiment includes: an instruction receiving unit 501 configured to receive a data processing instruction, where the data type requested to be processed by the data processing instruction is financial data; an instruction sending unit 502 configured to send a data processing instruction to the distributed system in any of the embodiments, and receive a data processing result returned by the distributed system; a result transmitting unit 503 configured to transmit the data processing result.
In this embodiment, the apparatus 500 further includes: the instruction retransmission unit is configured to send the data processing instruction to the distributed system again if the data processing result returned by the distributed system is not received within a third preset time period; and the information sending unit is configured to send prompt information of task overtime if the number of times of sending the data processing instruction reaches a preset threshold and a data processing result returned by the distributed system is not received.
Referring now to fig. 6, a schematic diagram of an electronic device (e.g., the server or terminal device of fig. 1) 600 suitable for use in implementing embodiments of the present disclosure is shown. The terminal device in the embodiments of the present disclosure may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), and the like, and a fixed terminal such as a digital TV, a desktop computer, and the like. The terminal device shown in fig. 6 is only an example, and should not bring any limitation to the functions and the use range of the embodiments of the present disclosure.
As shown in fig. 6, electronic device 600 may include a processing device (e.g., central processing unit, graphics processor, etc.) 601 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM) 602 or a program loaded from a storage device 608 into a Random Access Memory (RAM) 603. In the RAM603, various programs and data necessary for the operation of the electronic apparatus 600 are also stored. The processing device 601, the ROM 602, and the RAM603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
Generally, the following devices may be connected to the I/O interface 605: input devices 606 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 607 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 608 including, for example, magnetic tape, hard disk, etc.; and a communication device 609. The communication means 609 may allow the electronic device 600 to communicate with other devices wirelessly or by wire to exchange data. While fig. 6 illustrates an electronic device 600 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided. Each block shown in fig. 6 may represent one device or may represent multiple devices as desired.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer-readable medium, the computer program comprising program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 609, or may be installed from the storage means 608, or may be installed from the ROM 602. The computer program, when executed by the processing device 601, performs the above-described functions defined in the methods of embodiments of the present disclosure. It should be noted that the computer readable medium described in the embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In embodiments of the disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In embodiments of the present disclosure, however, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.
The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: receiving a data processing instruction, wherein the data type requested to be processed by the data processing instruction is financial data; sending a data processing instruction to the distributed system in any embodiment, and receiving a data processing result returned by the distributed system; and sending a data processing result.
Computer program code for carrying out operations for embodiments of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present disclosure may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes an instruction receiving unit, an instruction transmitting unit, and a result transmitting unit. The names of these units do not in some cases constitute a limitation of the unit itself, and for example, the instruction receiving unit may also be described as a "unit that receives a data processing instruction".
The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the embodiments of the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is made without departing from the inventive concept as defined above. For example, the above features and (but not limited to) technical features with similar functions disclosed in the embodiments of the present disclosure are mutually replaced to form the technical solution.

Claims (12)

1. A distributed system, comprising: a distributor cluster, an aggregator cluster, and an operations cluster, wherein,
the distributor cluster is configured to: receiving a data processing task; acquiring target operation data required by the data processing task; sending the data processing task and the target operational data to the aggregator cluster;
the aggregator cluster is configured to: receiving the data processing task and the target operation data; decomposing the data processing task into a plurality of subtasks, and extracting an operation data subset corresponding to each subtask from the target operation data; distributing each subtask and the corresponding operation data subset to at least one operation node in the operation cluster; receiving a subtask processing result returned by the at least one operation node; in response to receiving the subtask processing results of all subtasks within a first preset time, aggregating each subtask processing result into a data processing result; and sending the data processing result to the distributor cluster;
the compute cluster is configured to: receiving subtasks distributed by the aggregator and operation data subsets thereof; executing the subtask to generate a subtask processing result of the subtask; and sending each subtask processing result to the aggregator.
2. The distributed system of claim 1, wherein each distribution node in the distributor cluster, each aggregation node in the aggregator cluster, and each compute node in the compute cluster employ a coroutine computation policy.
3. The distributed system of claim 2, wherein the distributed system further comprises a cache cluster configured to store operational data; and the number of the first and second groups,
the distributor cluster is further configured to: determining a target operation data type corresponding to the data processing task based on a pre-established corresponding relation between the data processing task and the operation data type; and acquiring the operation data pointed by the target operation data type from the cache cluster to obtain the target operation data.
4. The distributed system of claim 1, wherein the aggregator cluster is further configured to:
if the subtask processing results of all the subtasks are not received within the first preset time, sending the subtasks and the corresponding operation data subsets to the operation nodes which do not return the subtask processing results again;
and if the times of sending the subtasks and the corresponding operation data subsets reach a first preset number and the subtask processing results of the subtasks are not received yet, determining the data processing results as task overtime and generating alarm information.
5. The distributed system of claim 1, wherein the distributor cluster is further configured to:
if the data processing result is not received within a second preset time, sending the data processing task and the target operation data to the aggregator cluster again;
and if the times of sending the data processing tasks and the target operation data reach a second preset number and the data processing results are not received yet, determining the data processing results as task overtime and generating alarm information.
6. The distributed system according to one of claims 1 to 5, wherein the distributed system further comprises a reserved network address and a standby server configured with system files in advance, and when the utilization rate of the operation cluster reaches a preset threshold value, the network address of the standby server is updated to the reserved network address, so that the standby server is accessed to the operation cluster as a new operation node.
7. A method of data processing, comprising:
receiving a data processing instruction, wherein the data type requested to be processed by the data processing instruction is financial data;
sending the data processing instruction to the distributed system of one of claims 1 to 6, and receiving a data processing result returned by the distributed system;
and sending the data processing result.
8. The method of claim 7, further comprising:
if the data processing result returned by the distributed system is not received within a third preset time, the data processing instruction is sent to the distributed system again; and (c) a second step of,
and if the number of times of sending the data processing instruction reaches a third preset number of times and a data processing result returned by the distributed system is not received, sending prompt information of task overtime.
9. An apparatus for data processing, comprising:
the instruction receiving unit is configured to receive a data processing instruction, and the data type requested to be processed by the data processing instruction is financial data;
an instruction sending unit, configured to send the data processing instruction to the distributed system according to one of claims 1 to 6, and receive a data processing result returned by the distributed system;
a result transmitting unit configured to transmit the data processing result.
10. The apparatus of claim 9, the apparatus further comprising:
the instruction retransmitting unit is configured to send the data processing instruction to the distributed system again if the data processing result returned by the distributed system is not received within a third preset time length;
and the information sending unit is configured to send prompt information of overtime task if the number of times of sending the data processing instruction reaches a third preset number of times and the data processing result returned by the distributed system is not received.
11. An electronic device, comprising:
one or more processors;
a storage device having one or more programs stored thereon,
when executed by the one or more processors, cause the one or more processors to implement a method of data processing according to claim 7 or 8.
12. A computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out the method of data processing according to claim 7 or 8.
CN202110456873.3A 2021-04-27 2021-04-27 Distributed system and data processing method and device Pending CN115250276A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110456873.3A CN115250276A (en) 2021-04-27 2021-04-27 Distributed system and data processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110456873.3A CN115250276A (en) 2021-04-27 2021-04-27 Distributed system and data processing method and device

Publications (1)

Publication Number Publication Date
CN115250276A true CN115250276A (en) 2022-10-28

Family

ID=83696092

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110456873.3A Pending CN115250276A (en) 2021-04-27 2021-04-27 Distributed system and data processing method and device

Country Status (1)

Country Link
CN (1) CN115250276A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116980420A (en) * 2023-09-22 2023-10-31 新华三技术有限公司 Cluster communication method, system, device, equipment and medium

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170339008A1 (en) * 2016-05-17 2017-11-23 Microsoft Technology Licensing, Llc Distributed operational control in computing systems
CN107888684A (en) * 2017-11-13 2018-04-06 小草数语(北京)科技有限公司 Distributed system calculating task processing method, device and controller
CN108563531A (en) * 2018-04-18 2018-09-21 中国银行股份有限公司 Data processing method and device
CN110443695A (en) * 2019-07-31 2019-11-12 中国工商银行股份有限公司 Data processing method and its device, electronic equipment and medium
CN111078323A (en) * 2019-10-12 2020-04-28 平安科技(深圳)有限公司 Coroutine-based data processing method and device, computer equipment and storage medium
CN111190718A (en) * 2020-01-07 2020-05-22 第四范式(北京)技术有限公司 Method, device and system for realizing task scheduling
US20200210243A1 (en) * 2019-01-02 2020-07-02 Alibaba Group Holding Limited System and method for offloading computation to storage nodes in distributed system
CN111625364A (en) * 2020-05-30 2020-09-04 北京字节跳动网络技术有限公司 Task allocation method and device, electronic equipment and computer readable medium
CN112118315A (en) * 2020-09-18 2020-12-22 北京有竹居网络技术有限公司 Data processing system, method, device, electronic equipment and storage medium
CN112596858A (en) * 2020-12-25 2021-04-02 苏州浪潮智能科技有限公司 Data processing method and device, electronic equipment and storage medium

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170339008A1 (en) * 2016-05-17 2017-11-23 Microsoft Technology Licensing, Llc Distributed operational control in computing systems
CN107888684A (en) * 2017-11-13 2018-04-06 小草数语(北京)科技有限公司 Distributed system calculating task processing method, device and controller
CN108563531A (en) * 2018-04-18 2018-09-21 中国银行股份有限公司 Data processing method and device
US20200210243A1 (en) * 2019-01-02 2020-07-02 Alibaba Group Holding Limited System and method for offloading computation to storage nodes in distributed system
CN110443695A (en) * 2019-07-31 2019-11-12 中国工商银行股份有限公司 Data processing method and its device, electronic equipment and medium
CN111078323A (en) * 2019-10-12 2020-04-28 平安科技(深圳)有限公司 Coroutine-based data processing method and device, computer equipment and storage medium
CN111190718A (en) * 2020-01-07 2020-05-22 第四范式(北京)技术有限公司 Method, device and system for realizing task scheduling
CN111625364A (en) * 2020-05-30 2020-09-04 北京字节跳动网络技术有限公司 Task allocation method and device, electronic equipment and computer readable medium
CN112118315A (en) * 2020-09-18 2020-12-22 北京有竹居网络技术有限公司 Data processing system, method, device, electronic equipment and storage medium
CN112596858A (en) * 2020-12-25 2021-04-02 苏州浪潮智能科技有限公司 Data processing method and device, electronic equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116980420A (en) * 2023-09-22 2023-10-31 新华三技术有限公司 Cluster communication method, system, device, equipment and medium
CN116980420B (en) * 2023-09-22 2023-12-15 新华三技术有限公司 Cluster communication method, system, device, equipment and medium

Similar Documents

Publication Publication Date Title
WO2021088641A1 (en) Data transmission method, data processing method, data reception method and device, and storage medium
CN112416632B (en) Event communication method and device, electronic equipment and computer readable medium
CN111200606A (en) Deep learning model task processing method, system, server and storage medium
US20190370293A1 (en) Method and apparatus for processing information
CN112686528B (en) Method, device, server and medium for distributing customer service resources
CN115250276A (en) Distributed system and data processing method and device
CN107438097B (en) Network request processing method and device
CN115550354A (en) Data processing method and device and computer readable storage medium
US9021109B1 (en) Controlling requests through message headers
CN113965628A (en) Message scheduling method, server and storage medium
CN116541167A (en) System flow control method, device, electronic equipment and computer readable medium
US20230093004A1 (en) System and method for asynchronous backend processing of expensive command line interface commands
CN115374207A (en) Service processing method and device, electronic equipment and computer readable storage medium
CN114265692A (en) Service scheduling method, device, equipment and storage medium
CN114780361A (en) Log generation method, device, computer system and readable storage medium
CN115170321A (en) Method and device for processing batch transaction data
CN114647499A (en) Asynchronous job task concurrency control method and device, electronic equipment and storage medium
CN113472638A (en) Edge gateway control method, system, device, electronic equipment and storage medium
CN113778631A (en) Distributed transaction compensation method and device, electronic equipment and readable storage medium
CN112163176A (en) Data storage method and device, electronic equipment and computer readable medium
CN113781154A (en) Information rollback method, system, electronic equipment and storage medium
CN113778699A (en) Distributed transaction processing method, device, computer system and readable storage medium
CN115174588B (en) Bandwidth control method, device, apparatus, storage medium and program product
CN111510493A (en) Distributed data transmission method and device
CN117076057B (en) AI service request scheduling method, device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination