WO2018188607A1 - Procédé et dispositif de traitement de flux - Google Patents

Procédé et dispositif de traitement de flux Download PDF

Info

Publication number
WO2018188607A1
WO2018188607A1 PCT/CN2018/082641 CN2018082641W WO2018188607A1 WO 2018188607 A1 WO2018188607 A1 WO 2018188607A1 CN 2018082641 W CN2018082641 W CN 2018082641W WO 2018188607 A1 WO2018188607 A1 WO 2018188607A1
Authority
WO
WIPO (PCT)
Prior art keywords
stream processing
block
data storage
management unit
block number
Prior art date
Application number
PCT/CN2018/082641
Other languages
English (en)
Chinese (zh)
Inventor
曹俊
胡斐然
林铭
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2018188607A1 publication Critical patent/WO2018188607A1/fr

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/56Provisioning of proxy services
    • H04L67/561Adding application-functional data or data for application control, e.g. adding metadata
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/60Scheduling or organising the servicing of application requests, e.g. requests for application data transmissions using the analysis and optimisation of the required network resources

Definitions

  • the present invention relates to the field of information technology, and in particular, to a stream processing method and apparatus.
  • Work flow is an abstraction, generalization, and description of the logical rules of how processes are organized before and after each other in the workflow and workflow.
  • the workflow concept originated in the field of production organization and office automation. It is a concept proposed for fixed-program activities in daily work. The purpose is to perform these tasks according to certain rules and processes by breaking down the work into well-defined processes or roles. Process and monitor it to improve work efficiency, better control processes, enhance customer service, and effectively manage business processes.
  • Workflow modeling which means that the workflow is represented in the computer in the appropriate model and implemented. Through workflow modeling, workflows can be managed through a workflow system.
  • the main function of the stream processing system is to define, execute and manage the workflow through the support of computer technology, and coordinate the information interaction between the processes in the workflow execution process and between the members of the group.
  • the stream processing system usually consists of a workflow design tool and a workflow management tool.
  • the workflow design tool allows the user to design their own workflow definition, and the workflow management tool is responsible for managing the execution of the workflow.
  • the workflow instance includes one or more tasks, and each agent needs to perform some work.
  • Apache Storm is a typical stream processing system in the prior art. It consists of a Master-Slave architecture. Nimbus is the main process and Supervisor is the slave process running the service.
  • the stream processing system Storm establishes a network connection with the distributed file system, and the distributed file system stores data that needs to be processed by the stream processing system Storm.
  • the distributed file system includes a Master Server (primary server) and a Data Server (data server), and the Master Server is The metadata management node manages the distribution of data blocks.
  • the Data Server is a data storage node point, and stores data block data.
  • the Storm and the data storage node point are set on different servers.
  • Storm In Storm's stream processing operations, Storm first needs to obtain data from the data server that needs to be streamed. Specifically, the data server provides a data query interface, and the Storm inputs parameters to the data query interface through the network, acquires data from the data server through the network, and then loads the acquired data into the Supervisor.
  • the stream processing system needs to acquire data from the data storage node through the network, the speed of acquiring the data is limited by the network performance, which may result in the performance of the entire stream processing being limited by the network, in the stream processing system and the data storage node.
  • the network transmission speed is low, the speed of stream processing is greatly affected.
  • an embodiment of the present invention provides a stream processing method and apparatus, which can overcome the technical problem that the speed of the stream processing is affected by the low network transmission speed between the stream processing system and the data storage node.
  • an embodiment of the present invention provides a stream processing method, where the method is applied to a stream processing system, where the stream processing system includes a stream processing management unit and a stream processing computing unit, and the method includes:
  • the stream processing management unit receives a stream processing task sent by the client, where the stream processing task includes a stream processing logic and a path of the file to be processed in the distributed file system, and the distributed file system includes a metadata management node and a plurality of data storage nodes, each a data storage node is provided with a stream processing computing unit;
  • the stream processing management unit acquires, from the metadata management node, a block number of each block corresponding to the path of the file to be processed, and a network address of the data storage node where each block is located;
  • the stream processing management unit respectively sends the stream processing logic and the block number of each block to the stream processing unit of the data storage node where each block is located;
  • the stream processing calculation unit acquires the block data corresponding to the received block number from the data storage node where it is located, and executes stream processing logic for the block data corresponding to the received block number.
  • the embodiment of the present invention distributes the stream processing calculation unit to each data storage node, and the stream processing management unit sends the stream processing task to the corresponding data storage node according to the path of the file to be processed, by the corresponding data storage node.
  • the stream processing calculation unit directly reads the block data corresponding to the file to be processed locally, and runs the stream processing logic on the read block data. Since the stream processing calculation unit locally reads the file to be processed, the stream processing system can be overcome. A technical problem with the low speed of network transmission between data storage nodes and the speed of stream processing.
  • the stream processing logic is executed in parallel in different stream processing calculation units, so that the stream processing speed can be further accelerated and the processing efficiency can be improved.
  • the data storage node is provided with a data management unit, and the stream processing calculation unit is configured as a program library, and the data management unit performs a function of the stream processing calculation unit by loading the program library.
  • the stream processing calculation unit is set in the data management unit through the program library, and the data management unit can directly read the block data, after the data management unit can read the block data, the stream processing logic can be executed, which can speed up the stream processing speed. .
  • the method further includes:
  • the stream processing calculation unit transmits the processing result obtained by the execution stream processing logic to the stream processing management unit.
  • the metadata management node records the first correspondence between the path of the file to be processed in the distributed file system and the block number of each block, and the stream processing management unit obtains the data from the metadata management node.
  • the block number of each block corresponding to the path, and the network address of the data storage node where the block number of each block is located specifically includes:
  • the stream processing management unit acquires the block numbers of the respective blocks from the first correspondence relationship according to the path of the file to be processed in the distributed file system.
  • the metadata management node records the second correspondence between the block number of each block and the network address of the data storage node where the block number of each block is located, and the stream processing management unit slave element
  • the data management node obtains the block number of each block corresponding to the path, and the network address of the data storage node where the block number of each block is located specifically includes:
  • the stream processing management unit acquires the network address of the data storage node where each block number is located from the second correspondence relationship according to each block number.
  • an embodiment of the present invention provides a stream processing system, including a stream processing management unit and a stream processing computing unit.
  • a stream processing management unit configured to receive a stream processing task sent by the client, where the stream processing task includes a path of the stream processing logic and the file to be processed in the distributed file system, and the distributed file system includes a metadata management node and multiple data storage a node, each data storage node is provided with a stream processing computing unit;
  • the stream processing management unit is further configured to obtain, from the metadata management node, each block number corresponding to the path, and a network address of the data storage node where each block number is located;
  • the stream processing management unit is further configured to separately send the stream processing logic and the block number corresponding to each network address to the stream processing unit of the corresponding data storage node;
  • the stream processing calculation unit is configured to acquire block data corresponding to the received block number from the data storage node where the data is stored, and execute stream processing logic for the block data corresponding to the received block number.
  • the data storage node is provided with a data management unit, and the stream processing calculation unit is configured as a program library, and the data management unit performs a function of the stream processing calculation unit by loading the program library.
  • the stream processing calculation unit is further configured to send the processing result obtained by the execution stream processing logic to the stream processing management unit.
  • the metadata management node records the first correspondence between the path of the file to be processed in the distributed file system and the block number of each block
  • the stream processing management unit is specifically configured to:
  • the block number of each block is obtained from the first correspondence according to the path of the file to be processed in the distributed file system.
  • the metadata management node records the second correspondence between the block number of each block and the network address of the data storage node where the block number of each block is located, and the stream processing management unit specifically uses to:
  • the network address of the data storage node where each block number is located is obtained from the second correspondence according to the block number of each block.
  • an embodiment of the present invention provides a stream processing management unit that performs the functions of a stream processing management unit in the stream processing system.
  • an embodiment of the present invention provides a host, including a memory, a processor, and a bus.
  • the memory and the processor are connected to the bus.
  • the memory stores program instructions, and the processor executes the program instructions to implement stream processing in the stream processing system. The function of the snap-in.
  • FIG. 1 is a schematic structural diagram of a stream processing system according to an embodiment of the present invention.
  • FIG. 2 is another schematic structural diagram of a stream processing system according to an embodiment of the present invention.
  • FIG. 3 is a data interaction diagram of a stream processing method according to an embodiment of the present invention.
  • FIG. 4 is a schematic structural diagram of an apparatus of a stream processing system according to an embodiment of the present invention.
  • FIG. 5 is a schematic structural diagram of a device of a host according to an embodiment of the present invention.
  • FIG. 1 is a schematic diagram of a connection between a stream processing system and a distributed file system and a client according to an embodiment of the present invention.
  • the stream processing system includes a stream processing management unit 302 and a plurality of stream processing calculations.
  • Units 1011, 1021, ..., and 1031 the distributed file system includes a metadata management node 201 and a plurality of data storage nodes 101, 102, ..., and 103.
  • the client 301 is connected to the stream processing management unit 302, and the stream processing management unit 302 is connected to the metadata management node 201 and the plurality of data storage nodes 101, 102, ..., 103, respectively.
  • the client 301 is configured to receive a stream processing job submitted by the user.
  • the user when the user submits the stream processing job, the user specifies the path of the data to be processed in the distributed file system, and specifies what kind of processing is to be performed on the data to be processed.
  • the path of the stream processing task to be processed in the distributed file system may be, for example, a URL (abbreviation of Universal Resource Locator, a uniform resource locator), and the URL is a storage identifier of the distributed file system, and the URL may be in the metadata management node.
  • 201 finds the block number of each block corresponding to the file to be processed.
  • the client 301 generates a stream processing task according to the stream processing job submitted by the user, where the stream processing task includes a path of the stream processing logic and the data to be processed in the distributed file system, wherein the stream processing logic defines what kind of processing is to be performed on the data, for example.
  • the stream processing logic can specify to search for anomalous events in the data to be processed.
  • the client 301 sends a stream processing task to the stream processing management unit 302.
  • the stream processing management unit 302 performs scheduling according to the stream processing task, and the selected stream processing computing unit acquires the file to be processed from the distributed file system, and processes the file to be processed by the stream processing logic. deal with.
  • the stream processing system can be implemented based on the apache flink architecture
  • the client 301 is the client (client) process of the apache flink
  • the stream processing management unit 302 is the job manager (work manager) process of the apache flink
  • the metadata management node 201 is provided with a metadata management unit 2011 and a database 2012, and the metadata management unit 2011 provides an interface through which the external device can query the database 2012.
  • the database 2012 records the first correspondence between the path of the file to be processed in the distributed file system and the block number of each block, and the block number of each block and the network address of the data storage node where each block is located. The second correspondence.
  • files to be processed are stored in a database in a data storage node in the form of fragments, where the fragments refer to different block data, each piece of data corresponds to a block number, and the metadata management node records all files in the distributed storage system.
  • the correspondence between the path and the block number of each block, and each block number corresponds to the database of which data storage node is stored.
  • the data storage node 101 is provided with a stream processing calculation unit 1011 and a database 1012.
  • the database 1012 records the block data and the correspondence between the block number and the block data.
  • the stream processing calculation unit 1011 can access the database 1012 and obtain the block number from the database 1012. Corresponding block data.
  • the data storage nodes 102 and 103 have a similar structure to the data storage node 101, except that the block data recorded by the own database is different, and details are not described herein.
  • the distributed file system can be implemented by Hadoop
  • the database 2012, the database 1012, the database 1022, ... and the database 1032 can be implemented by Hbase (Hadoop Database, Hadoop database)
  • the metadata management unit 2011 can be the hmaster process of the Hbase database. .
  • the client 301 and the stream processing management unit 302 can be set on the same host, and establish a data connection with the metadata management node 201 and the data storage nodes 101, 102, .
  • the client 301 and the stream processing management unit 302 may also be disposed on different hosts, which is not limited by the embodiment of the present invention.
  • FIG. 2 is another schematic structural diagram of a stream processing system according to an embodiment of the present invention.
  • a client 301 and a stream processing management unit 302 are disposed on a host 10, and the host 10 is provided.
  • the operating system 303 and the hardware 304 are also included.
  • the hardware 304 is used to carry the operation of the operating system 303.
  • the hardware 304 includes a physical network card 3041.
  • the client 301 and the stream processing management unit 302 respectively run on the operating system 303 in the form of a process.
  • the physical network card 3041 accesses the network 50.
  • the metadata management node 201 includes a database 2012, a metadata management unit 2011, an operating system 2013, and a hardware 2014.
  • the database 2012 and the metadata management unit 2011 respectively run on the operating system 2013 in the form of a process, and the hardware 2014 is used for the bearer operation.
  • the operation of the system 2013 includes the physical network card 20141, the physical network card 20141 accessing the network 50, and the metadata management unit 2011 providing an interface through which the external device can access the database 2012.
  • the data storage node 101 includes data 1012, a stream processing computing unit 1011, an operating system 1013, and a hardware 1014.
  • the database 1012 and the stream processing computing unit 1011 respectively run on the operating system 1013 in the form of a process, and the hardware 1014 is used to carry the operating system.
  • the hardware 1014 includes a physical network card 10141, and the physical network card 10141 accesses the network 50.
  • the stream processing computing unit 1011 can access the database 1012.
  • the structure of the data storage nodes 102 and 103 is similar to that of the data storage node 101 and will not be described herein.
  • the stream processing management unit 302 and the client 301, the metadata management unit 2011, and the stream processing computing units 101, 1021, ..., and 1031 can pass RPC (Remote Procedure Call Protocol). Implement communication.
  • RPC Remote Procedure Call Protocol
  • the embodiment of the present invention provides a stream processing method, where the stream processing management unit 302 receives a stream processing task sent by the client 301, where the stream processing task includes a stream processing logic and a path of the file to be processed in the distributed file system.
  • the stream processing management unit 302 acquires the block number of each block corresponding to the path from the metadata management node 201, and the network address of the data storage node where the block number of each block is located; the stream processing management unit 302 respectively processes the stream processing logic and each The block number corresponding to the network address is sent to the stream processing unit of the corresponding data storage node; the stream processing calculation unit acquires the block data corresponding to the received block number from the data storage node where it is located, and performs block data corresponding to the received block number.
  • Stream processing logic acquires the block number of each block corresponding to the path from the metadata management node 201, and the network address of the data storage node where the block number of each block is located; the stream processing management unit 302 respectively processes the stream processing logic and each The block number corresponding to the network address is sent to the stream processing unit of the corresponding data storage node; the stream processing calculation unit acquires the block data corresponding to the received block number from the data storage node where it is located, and performs block data corresponding to
  • the embodiment of the present invention distributes the stream processing calculation unit to each data storage node, and the stream processing management unit sends the stream processing task to the corresponding data storage node according to the path of the file to be processed, by the corresponding data storage node.
  • the stream processing calculation unit directly reads the block data corresponding to the file to be processed locally, and runs the stream processing logic on the read block data. Since the stream processing calculation unit locally reads the file to be processed, the stream processing system can be overcome. A technical problem with the low speed of network transmission between data storage nodes and the speed of stream processing.
  • FIG. 3 is a data interaction diagram of a stream processing method according to an embodiment of the present invention. As shown in FIG. 3, the stream processing method includes the following steps:
  • Step 401 The stream processing management unit 302 receives the stream processing task sent by the client 301, where the stream processing task includes the stream processing logic and the path of the file to be processed in the distributed file system.
  • the client 301 can be a client process in the apache flink system
  • the stream processing management unit 302 can be a job manager process in the apache flink system.
  • Step 402 The stream processing management unit 302 sends a query request to the metadata management node 201, wherein the query request carries a path of the file to be processed in the distributed file system.
  • the query request includes an input parameter and a query instruction
  • the stream processing management unit 302 takes the path of the file to be processed in the distributed file system as an input parameter, and sends the input parameter and the control instruction to the metadata of the metadata management node 201.
  • the interface provided by the management unit 2011 for accessing the database 2012.
  • Step 403 The metadata management node 201 returns the block number of each block corresponding to the path and the network address of the data storage node corresponding to each block to the stream processing management unit 302 according to the query request.
  • the database 2012 of the metadata management node 201 records the first correspondence between the path of the file to be processed in the distributed file system and the block number of each block, and the block number of each block and the network of the data storage node where each block is located.
  • the second correspondence relationship of the addresses therefore, the stream processing management unit 302 of the metadata management node 201 acquires the block numbers of the respective blocks from the first correspondence relationship according to the path of the file to be processed in the distributed file system, and according to the block numbers of the respective blocks Obtaining the network address of the data storage node where each block is located from the second correspondence.
  • the block numbers acquired by the stream processing management unit 302 are block number 1 and block number 2, respectively. It is worth noting that in practical applications, the block number includes a plurality of, and for the sake of brief description, only two block numbers are taken as an example. To be explained, the stream processing management unit 302 queries the network address A of the data storage node 101 based on the block number 1, and queries the network address B of the data storage node 102 based on the block number 2.
  • Step 404 The stream processing management unit 302 transmits the stream processing logic and block number 1 to the stream processing computing unit 1011.
  • the stream processing management unit 302 queries the network address A of the data storage node 101 according to the block number 1, the stream processing task and the block number 1 corresponding to the network address A are sent to the stream processing calculation unit of the data storage node 101. 1011.
  • Step 405 The stream processing management unit 302 transmits the stream processing logic and block number 2 to the stream processing computing unit 1021.
  • the stream processing management unit 302 queries the network address B of the data storage node 102 according to the block number 2, the stream processing task and the block number 2 corresponding to the network address B are sent to the stream processing calculation unit of the data storage node 102. 1021.
  • the stream processing computing unit 1011 can be, for example, a task manager process in the apache flink system
  • the stream processing computing unit 1021 can be, for example, another task manager process in the apache flink system.
  • Step 406 The stream processing calculation unit 1011 acquires the received block data corresponding to the block number 1 from the data storage node 101 where it is located, and executes stream processing logic for the block data corresponding to the received block number 1.
  • the stream processing calculation unit 1011 acquires the block data corresponding to the block number 1 received from the stream processing management unit 302 from the database 1012 of the data storage node 101 in which it is located, and performs stream processing for the block data corresponding to the block number 1. logic.
  • data storage node 101 is further provided with a data management unit for accessing database 1012 to manage block data in database 1012.
  • the distributed file system can be Hadoop
  • the Hadoop database is implemented by the Hbase database
  • the metadata management unit 2011 is the Hmaster process of the Hbase database
  • the stream processing computing unit is set as a program library
  • the data management unit executes the flow by loading the program library. Handle the functions of the computing unit.
  • the data management unit is, for example, a HReigonServer process of the Hbase database, and the HReigonServer process embeds the task manager process into the HReigonServer process, and the task manager process can be set to a library of a jar package or a so file, and provides a startup interface, and the HReigonServer process is After loading the library, you can implement the task manager process by running the startup interface.
  • the HReigonServer process that implements the function of the task manager process can locally read the block data of the database 1012, the process of acquiring the block data can be prevented from being affected by the performance of the external network, and since the HReigonServer process is in the process Direct access to the database 1012, that is, directly read the block data from the memory, so the speed of the block data is faster, which can effectively improve the efficiency of stream processing.
  • the data management unit and the stream processing computing unit 1011 can concurrently run at the operating system 1013, and the stream processing computing unit 1011 accesses the database 1012 through an interface provided by the data management unit, in these examples, although not through the HReigonServer process.
  • the database 1012 is directly accessed within the process, but the stream processing computing unit 1011 can access the database 1012 locally, and can also avoid the impact on external network performance.
  • Step 407 The stream processing calculation unit 1021 acquires the block data corresponding to the received block number 2 from the data storage node 102 where it is located, and executes stream processing logic for the block data corresponding to the received block number 2.
  • data storage node 102 is provided with a data management unit for accessing database 1022 to manage block data.
  • the distributed file system can be Hadoop, the Hadoop database is implemented by the Hbase database, the metadata management unit 2011 is the Hmaster process of the Hbase database, the stream processing computing unit 1011 is set as a program library, and the data management unit executes the stream processing calculation unit by loading the program library. The function.
  • the data management unit is, for example, a HReigonServer process of the Hbase database, and the HReigonServer process embeds the task manager process into the HReigonServer process, and the task manager process can be set to a library of a jar package or a so file, and provides a startup interface, and the HReigonServer process is After loading the library, you can implement the task manager process by running the startup interface.
  • the HReigonServer process that implements the function of the task manager process can read the block data of the database 1022 locally, the process of acquiring the block data can be prevented from being affected by the performance of the external network, and since the HReigonServer process is in the process Direct access to the database 1022, so the speed of obtaining block data is faster, and the efficiency of stream processing can be effectively improved.
  • the data management unit and the stream processing computing unit 1021 can concurrently run at the operating system 1023, and the stream processing computing unit 1021 accesses the database 1022 through an interface provided by the data management unit, in these examples, although not through the HReigonServer process.
  • the database 1012 is directly accessed within the process, but the stream processing computing unit 1021 can access the database 1022 locally, and can also avoid the impact on external network performance.
  • Step 408 The stream processing calculation unit sends the first processing result obtained by the stream processing logic to the block data corresponding to the block number 1 to the stream processing management unit 302.
  • Step 409 The stream processing calculation unit transmits the second processing result obtained by the stream processing logic of the block data corresponding to the block number 2 to the stream processing management unit 302.
  • the embodiment of the present invention distributes the stream processing calculation unit to each data storage node, and the stream processing management unit sends the stream processing task to the corresponding data storage node according to the path of the file to be processed, and the corresponding data.
  • the stream processing calculation unit on the storage node directly reads the block data corresponding to the file to be processed locally, and runs the stream processing logic on the read block data. Since the stream processing calculation unit reads the file to be processed locally, the cause can be overcome.
  • the technical problem of low network transmission speed between the stream processing system and the data storage node affecting the speed of stream processing.
  • the stream processing logic is executed in parallel in different stream processing calculation units, so that the stream processing speed can be further accelerated and the processing efficiency can be improved.
  • stream processing system 90 may also be implemented based on the Storm, Spark, or Samza architecture.
  • FIG. 4 is a schematic structural diagram of a device of a stream processing management unit according to an embodiment of the present invention.
  • the stream processing management unit 302 includes:
  • the receiving module 601 is configured to receive a stream processing task sent by the client, where the stream processing task includes a path of the stream processing logic and the file to be processed in the distributed file system, where the distributed file system includes a metadata management node and multiple data storage nodes. Each data storage node is provided with a stream processing computing unit;
  • the query module 602 is configured to obtain, from the metadata management node, a block number of each block corresponding to the path, and a network address of the data storage node where each block is located;
  • the sending module 603 is configured to separately send the stream processing logic and the block number of each block to the stream processing unit of the data storage node where each block is located.
  • the receiving unit 601 is further configured to receive a processing result obtained by the execution stream processing logic sent by the stream processing calculation unit.
  • the metadata management node records the first correspondence between the path of the file to be processed in the distributed file system and the block number of each block, and the block number of each block and the network address of the data storage node where each block is located.
  • the query module 602 is specifically used to:
  • the network address of the data storage node where each block is located is obtained from the second correspondence according to the block number of each block.
  • FIG. 5 is a schematic structural diagram of a device according to an embodiment of the present invention.
  • the host 50 includes a memory 502, a processor 501, and a bus 503.
  • the memory 502 and the processor 501 are connected to the bus 503.
  • the memory 502 stores program instructions, and the processor 501 executes program instructions to implement the functions of the stream processing management unit 302 in the stream processing system described above.
  • the embodiment of the present invention distributes the stream processing calculation unit to each data storage node, and the stream processing management unit sends the stream processing task to the corresponding data storage node according to the path of the file to be processed, by the corresponding data storage node.
  • the stream processing calculation unit directly reads the block data corresponding to the file to be processed locally, and runs the stream processing logic on the read block data. Since the stream processing calculation unit locally reads the file to be processed, the stream processing system can be overcome. A technical problem with the low speed of network transmission between data storage nodes and the speed of stream processing.
  • the stream processing logic is executed in parallel in different stream processing calculation units, so that the stream processing speed can be further accelerated and the processing efficiency can be improved.
  • any of the device embodiments described above are merely illustrative, wherein the units described as separate components may or may not be physically separated, and the components displayed as the cells may or may not be Physical units can be located in one place or distributed to multiple network elements.
  • Some or all of the processes may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • the connection relationship between processes indicates that there is a communication connection between them, and specifically may be implemented as one or more communication buses or signal lines.
  • the present invention can be implemented by means of software plus necessary general hardware, and of course, dedicated hardware, dedicated CPU, dedicated memory, dedicated memory, Special components and so on.
  • functions performed by computer programs can be easily implemented with the corresponding hardware, and the specific hardware structure used to implement the same function can be various, such as analog circuits, digital circuits, or dedicated circuits. Circuits, etc.
  • software program implementation is a better implementation in more cases.
  • the technical solution of the present invention which is essential or contributes to the prior art, can be embodied in the form of a software product stored in a readable storage medium, such as a floppy disk of a computer.
  • U disk mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), disk or optical disk, etc., including a number of instructions to make a computer device (may be A personal computer, server, or network device, etc.) performs the methods described in various embodiments of the present invention.
  • a computer device may be A personal computer, server, or network device, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • Library & Information Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

La présente invention, selon les modes de réalisation, concerne un procédé et un dispositif de traitement de flux, le procédé consistant à : recevoir, par une unité de gestion de traitement de flux, une tâche de traitement de flux envoyée par un client ; obtenir, par l'unité de gestion de traitement de flux, d'un nœud de gestion de métadonnées, le numéro de bloc de chaque bloc correspondant au trajet d'un fichier à traiter et l'adresse réseau du nœud de mémorisation de données où est situé chaque bloc ; envoyer, par l'unité de gestion de traitement de flux, une logique de traitement de flux et le numéro de bloc de chaque bloc à une unité de traitement de flux du nœud de mémorisation de données où chaque bloc est situé respectivement ; et obtenir, par une unité de calcul de traitement de flux, des données de bloc correspondant au numéro de bloc reçu du nœud de mémorisation de données où est situé ce dernier, et appliquer une logique de traitement de flux aux données de bloc correspondant au numéro de bloc reçu. La solution susmentionnée permet de surmonter le problème technique de la faible vitesse de transmission de réseau entre un système de traitement de flux et un nœud de mémorisation de données affectant la vitesse de traitement de flux.
PCT/CN2018/082641 2017-04-11 2018-04-11 Procédé et dispositif de traitement de flux WO2018188607A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710233425.0A CN108696559B (zh) 2017-04-11 2017-04-11 流处理方法及装置
CN201710233425.0 2017-04-11

Publications (1)

Publication Number Publication Date
WO2018188607A1 true WO2018188607A1 (fr) 2018-10-18

Family

ID=63792265

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/082641 WO2018188607A1 (fr) 2017-04-11 2018-04-11 Procédé et dispositif de traitement de flux

Country Status (2)

Country Link
CN (1) CN108696559B (fr)
WO (1) WO2018188607A1 (fr)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111435938B (zh) * 2019-01-14 2022-11-29 阿里巴巴集团控股有限公司 一种数据请求的处理方法、装置及其设备
CN110046131A (zh) * 2019-01-23 2019-07-23 阿里巴巴集团控股有限公司 数据的流式处理方法、装置及分布式文件系统hdfs
CN111290744B (zh) * 2020-01-22 2023-07-21 北京百度网讯科技有限公司 流式计算作业处理方法、流式计算系统及电子设备

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6415297B1 (en) * 1998-11-17 2002-07-02 International Business Machines Corporation Parallel database support for workflow management systems
US20090125553A1 (en) * 2007-11-14 2009-05-14 Microsoft Corporation Asynchronous processing and function shipping in ssis
CN102456185A (zh) * 2010-10-29 2012-05-16 金蝶软件(中国)有限公司 一种分布式工作流处理方法及分布式工作流引擎系统
CN106339415A (zh) * 2016-08-12 2017-01-18 北京奇虎科技有限公司 数据的查询方法、装置及系统

Family Cites Families (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005041067A1 (fr) * 2003-10-27 2005-05-06 Shinji Furusho Systeme de traitement d'informations du type memoire distribuee
US7613848B2 (en) * 2006-06-13 2009-11-03 International Business Machines Corporation Dynamic stabilization for a stream processing system
US8150889B1 (en) * 2008-08-28 2012-04-03 Amazon Technologies, Inc. Parallel processing framework
CN101741885A (zh) * 2008-11-19 2010-06-16 珠海市西山居软件有限公司 分布式系统及分布式系统处理任务流的方法
US20110313934A1 (en) * 2010-06-21 2011-12-22 Craig Ronald Van Roy System and Method for Configuring Workflow Templates
CN102467411B (zh) * 2010-11-19 2013-11-27 金蝶软件(中国)有限公司 一种工作流处理方法、装置和控制引擎
CN102542367B (zh) * 2010-12-10 2015-03-11 金蝶软件(中国)有限公司 基于领域模型的云计算网络工作流处理方法、装置和系统
US9361323B2 (en) * 2011-10-04 2016-06-07 International Business Machines Corporation Declarative specification of data integration workflows for execution on parallel processing platforms
CN103309867A (zh) * 2012-03-09 2013-09-18 句容智恒安全设备有限公司 基于Hadoop平台的Web数据挖掘系统
US20130253977A1 (en) * 2012-03-23 2013-09-26 Commvault Systems, Inc. Automation of data storage activities
CA2948815A1 (fr) * 2014-05-13 2015-11-19 Cloud Crowding Corp. Stockage distribue de donnees securise et transmission d'un contenu multimedia de diffusion en continu
CN104063486B (zh) * 2014-07-03 2017-07-11 四川中亚联邦科技有限公司 一种大数据分布式存储方法和系统
CN105608077A (zh) * 2014-10-27 2016-05-25 青岛金讯网络工程有限公司 一种大数据分布式存储方法和系统
CN104536814B (zh) * 2015-01-16 2019-01-22 北京京东尚科信息技术有限公司 一种处理工作流的方法和系统
CN104657497A (zh) * 2015-03-09 2015-05-27 国家电网公司 一种基于分布式计算的海量用电信息并行计算系统及方法
CN105468756A (zh) * 2015-11-30 2016-04-06 浪潮集团有限公司 一种海量数据处理系统的设计和实现方法
CN106155791B (zh) * 2016-06-30 2019-05-07 电子科技大学 一种分布式环境下的工作流任务调度方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6415297B1 (en) * 1998-11-17 2002-07-02 International Business Machines Corporation Parallel database support for workflow management systems
US20090125553A1 (en) * 2007-11-14 2009-05-14 Microsoft Corporation Asynchronous processing and function shipping in ssis
CN102456185A (zh) * 2010-10-29 2012-05-16 金蝶软件(中国)有限公司 一种分布式工作流处理方法及分布式工作流引擎系统
CN106339415A (zh) * 2016-08-12 2017-01-18 北京奇虎科技有限公司 数据的查询方法、装置及系统

Also Published As

Publication number Publication date
CN108696559B (zh) 2021-08-20
CN108696559A (zh) 2018-10-23

Similar Documents

Publication Publication Date Title
US11711420B2 (en) Automated management of resource attributes across network-based services
CN109643312B (zh) 托管查询服务
CN109997126B (zh) 事件驱动提取、变换、加载(etl)处理
US10599478B1 (en) Automated reconfiguration of real time data stream processing
JP2022062036A (ja) 分散イベント処理システムのためのグラフ生成
JP2018088293A (ja) 単一テナント及び複数テナント環境を提供するデータベースシステム
US20050071209A1 (en) Binding a workflow engine to a data model
US11411921B2 (en) Enabling access across private networks for a managed blockchain service
Essa et al. Mobile agent based new framework for improving big data analysis
WO2019057055A1 (fr) Procédé et appareil de traitement de tâches, dispositif électronique, et support de stockage
CN111258978B (zh) 一种数据存储的方法
WO2016061935A1 (fr) Procédé d'ordonnancement de ressource, dispositif et support de stockage informatique
US20200278975A1 (en) Searching data on a synchronization data stream
WO2018188607A1 (fr) Procédé et dispositif de traitement de flux
CN112685499A (zh) 一种工作业务流的流程数据同步方法、装置及设备
WO2017020716A1 (fr) Procédé et dispositif de contrôle d'accès à des données
US10944814B1 (en) Independent resource scheduling for distributed data processing programs
US11601495B2 (en) Mechanism for a work node scan process to facilitate cluster scaling
US8627341B2 (en) Managing events generated from business objects
US11757959B2 (en) Dynamic data stream processing for Apache Kafka using GraphQL
CN109587224B (zh) 数据处理方法、装置、电子设备及计算机可读介质
US20230246916A1 (en) Service map conversion with preserved historical information
US20240137218A1 (en) Label filtering and encryption
US20210286819A1 (en) Method and System for Operation Objects Discovery from Operation Data
JP6563807B2 (ja) 情報処理システム、情報処理装置、処理制御方法、及び処理制御プログラム

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18784079

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18784079

Country of ref document: EP

Kind code of ref document: A1