CN108696559B

CN108696559B - Stream processing method and device

Info

Publication number: CN108696559B
Application number: CN201710233425.0A
Authority: CN
Inventors: 曹俊; 胡斐然; 林铭
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Cloud Computing Technologies Co Ltd
Priority date: 2017-04-11
Filing date: 2017-04-11
Publication date: 2021-08-20
Anticipated expiration: 2037-04-11
Also published as: WO2018188607A1; CN108696559A

Abstract

The embodiment of the invention discloses a stream processing method and a stream processing device, wherein the method comprises the following steps: the stream processing management unit receives a stream processing task sent by a client; the stream processing management unit acquires the block number of each block corresponding to the path of the file to be processed and the network address of the data storage node where each block is located from the metadata management node; the stream processing management unit respectively sends the stream processing logic and the block number of each block to the stream processing unit of the data storage node where each block is located; and the stream processing calculation unit acquires the block data corresponding to the received block number from the data storage node, and executes stream processing logic aiming at the block data corresponding to the received block number. By the method, the technical problem that the speed of stream processing is influenced due to the fact that the network transmission speed between the stream processing system and the data storage node is low can be solved.

Description

Stream processing method and device

Technical Field

The present invention relates to the field of information technologies, and in particular, to a stream processing method and apparatus.

Background

Workflow (Work flow) is an abstraction, generalization, description of the workflow and the logical rules how the various services in the workflow are organized together back and forth. The workflow concept originates from the field of production organization and office automation, is a concept provided aiming at the fixed program activities in daily work, and aims to decompose the work into well-defined flows or roles, execute the flows according to certain rules and processes and monitor the flows, thereby achieving the purposes of improving the work efficiency, better controlling the processes, enhancing the service to customers, effectively managing the business flows and the like. Workflow modeling, i.e. representing a workflow in a computer in a suitable model and performing calculations on it. Through workflow modeling, workflows can be managed through a workflow system.

The main function of the stream processing system is to define, execute and manage the workflow through the support of computer technology, and coordinate the information interaction between the streams and the group members in the workflow execution process. The stream processing system generally comprises a workflow design tool for a user to design a workflow definition thereof, and a workflow management tool for managing the execution of the workflow. During the workflow system operation, a workflow instance includes one or more businesses (tasks), each representing some work that needs to be done.

Apache Storm is a typical stream processing system in the prior art, and is composed of a Master-Slave architecture, Nimbus is a Master process, and Supervisor is a Slave process for running services. The stream processing system Storm is in network connection with a distributed file system, the distributed file system stores Data needing to be processed by the stream processing system Storm, the distributed file system comprises a Master Server (main Server) and a Data Server (Data Server), the Master Server is a metadata management node and manages the distribution condition of Data blocks, the Data Server is a Data storage node and stores Data block Data, and the Storm and the Data storage node are arranged on different servers.

In the stream processing job of Storm, Storm first needs to acquire data to be stream processed from a data server. Specifically, the data server provides a data query interface, Storm inputs parameters to the data query interface through a network, acquires data from the data server through the network, and then loads the acquired data into the hypervisor.

Since the stream processing system needs to acquire data from the data storage node through the network in the prior art, the speed of acquiring data is limited by the network performance, which results in that the performance of the whole stream processing is limited by the network, and when the network transmission speed between the stream processing system and the data storage node is low, the speed of stream processing is greatly affected.

Disclosure of Invention

To solve the problems in the prior art, embodiments of the present invention provide a stream processing method and apparatus, which can overcome the technical problem that the network transmission speed between a stream processing system and a data storage node is low, which affects the stream processing speed.

In a first aspect, an embodiment of the present invention provides a stream processing method, where the method is applied to a stream processing system, where the stream processing system includes a stream processing management unit and a stream processing computing unit, and the method includes:

the method comprises the steps that a stream processing management unit receives a stream processing task sent by a client, wherein the stream processing task comprises stream processing logic and a path of a file to be processed in a distributed file system, the distributed file system comprises a metadata management node and a plurality of data storage nodes, and each data storage node is provided with a stream processing calculation unit;

the stream processing management unit acquires the block number of each block corresponding to the path of the file to be processed and the network address of the data storage node where each block is located from the metadata management node;

the stream processing management unit respectively sends the stream processing logic and the block number of each block to the stream processing unit of the data storage node where each block is located;

and the stream processing calculation unit acquires the block data corresponding to the received block number from the data storage node, and executes stream processing logic aiming at the block data corresponding to the received block number.

The stream processing computing units are distributed on the data storage nodes, the stream processing management unit sends the stream processing tasks to the corresponding data storage nodes according to the paths of the files to be processed, the stream processing computing units on the corresponding data storage nodes directly read the block data corresponding to the files to be processed locally and run stream processing logic on the read block data, and the stream processing computing units read the files to be processed locally, so that the technical problem that the stream processing speed is influenced due to the fact that the network transmission speed between a stream processing system and the data storage nodes is low can be solved.

And the files to be processed are scattered into block data, and the stream processing logics are executed in parallel in different stream processing computing units respectively, so that the stream processing speed can be further increased, and the processing efficiency is improved.

In one implementation manner of the embodiment of the present invention, the data storage node is provided with a data management unit, the stream processing calculation unit is provided as a program library, and the data management unit executes the function of the stream processing calculation unit by loading the program library.

Since the stream processing calculation unit is provided in the data management unit through the library and the data management unit can directly read the block data, the stream processing logic can be executed after the data management unit can read the block data, and the stream processing speed can be increased.

In another implementation manner of the embodiment of the present invention, the method further includes:

the stream processing calculation unit transmits a processing result obtained by executing the stream processing logic to the stream processing management unit.

In another implementation manner of the embodiment of the present invention, the recording, by the metadata management node, a first correspondence between a path of a file to be processed in the distributed file system and a block number of each block, and the acquiring, by the stream processing management unit from the metadata management node, the block number of each block corresponding to the path, and a network address of the data storage node where the block number of each block is located specifically include:

and the stream processing management unit acquires the block number of each block from the first corresponding relation according to the path of the file to be processed in the distributed file system.

In another implementation manner of the embodiment of the present invention, the metadata management node records a second correspondence between a block number of each block and a network address of a data storage node where the block number of each block is located, and the stream processing management unit obtains, from the metadata management node, the block number of each block corresponding to the path, and the network address of the data storage node where the block number of each block is located specifically include:

and the stream processing management unit acquires the network address of the data storage node where each block number is located from the second corresponding relation according to each block number.

In a second aspect, an embodiment of the present invention provides a stream processing system, including a stream processing management unit and a stream processing computing unit,

the system comprises a stream processing management unit, a stream processing calculation unit and a file processing unit, wherein the stream processing management unit is used for receiving a stream processing task sent by a client, the stream processing task comprises stream processing logic and a path of a file to be processed in a distributed file system, the distributed file system comprises a metadata management node and a plurality of data storage nodes, and each data storage node is provided with a stream processing calculation unit;

the stream processing management unit is also used for acquiring each block number corresponding to the path and the network address of the data storage node where each block number is located from the metadata management node;

the stream processing management unit is also used for respectively sending the stream processing logic and the block numbers corresponding to the network addresses to the stream processing units of the corresponding data storage nodes;

and the stream processing calculation unit is used for acquiring the block data corresponding to the received block number from the data storage node, and executing stream processing logic aiming at the block data corresponding to the received block number.

In another implementation manner of the embodiment of the present invention, the stream processing calculating unit is further configured to send a processing result obtained by executing the stream processing logic to the stream processing managing unit.

In another implementation manner of the embodiment of the present invention, the metadata management node records a first correspondence between a path of the file to be processed in the distributed file system and a block number of each block, and the stream processing management unit is specifically configured to:

and acquiring the block number of each block from the first corresponding relation according to the path of the file to be processed in the distributed file system.

In another implementation manner of the embodiment of the present invention, the metadata management node records a second correspondence between a block number of each block and a network address of a data storage node where the block number of each block is located, and the stream processing management unit is specifically configured to:

and acquiring the network address of the data storage node where each block number is located from the second corresponding relation according to the block number of each block.

In a third aspect, an embodiment of the present invention provides a stream processing management unit, which executes a function of the stream processing management unit in the stream processing system.

In a fourth aspect, an embodiment of the present invention provides a host, including a memory, a processor, and a bus, where the memory and the processor are connected to the bus, the memory stores program instructions, and the processor executes the program instructions to implement functions of a stream processing management unit in the stream processing system.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments of the present invention will be briefly described below, and it is obvious that the drawings described below are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a schematic block diagram of a stream processing system according to an embodiment of the present invention;

FIG. 2 is another schematic diagram of a stream processing system according to an embodiment of the present invention;

FIG. 3 is a data interaction diagram of a stream processing method according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of an apparatus configuration of a stream processing system according to an embodiment of the present invention;

fig. 5 is a schematic device structure diagram of a host according to an embodiment of the present invention.

Detailed Description

Referring first to fig. 1, fig. 1 is a schematic diagram illustrating a connection between a stream processing system and a distributed file system and a client according to an embodiment of the present invention, where, as shown in fig. 1, the stream processing system includes a stream processing management unit 302 and a plurality of stream processing computing units 1011, 1021, … …, and 1031, and the distributed file system includes a metadata management node 201 and a plurality of

data storage nodes

101, 102, … …, and 103.

In the embodiment of the present invention, the client 301 is connected to the stream processing management unit 302, and the stream processing management unit 302 is connected to the metadata management node 201 and the plurality of

data storage nodes

101, 102, … …, and 103, respectively.

The client 301 is configured to receive a stream processing job submitted by a user, and in the embodiment of the present invention, when the user submits the stream processing job, the user specifies a path of data to be processed in the distributed file system, and specifies what processing is to be performed on the data to be processed.

The path of the file to be processed in the stream processing task in the distributed file system may be, for example, a URL (uniform Resource Locator), where the URL is a storage identifier of the distributed file system, and the block number of each block corresponding to the file to be processed may be found in the metadata management node 201 through the URL.

The client 301 generates a stream processing task according to a stream processing job submitted by a user, where the stream processing task includes stream processing logic and a path of data to be processed in the distributed file system, where the stream processing logic defines what kind of processing is performed on the data to be processed, and for example, the stream processing logic may specify that an abnormal event is searched for in the data to be processed.

The client 301 sends a stream processing task to the stream processing management unit 302, and the stream processing management unit 302 performs scheduling according to the stream processing task, selects the stream processing computing unit to obtain the file to be processed from the distributed file system, and processes the file to be processed with the stream processing logic.

For example, the stream processing system may be implemented based on an apache flash architecture, where the client 301 is a client process of the apache flash, the stream processing management unit 302 is a job manager process of the apache flash, and the stream processing calculation unit is a task manager process of the apache flash.

The metadata management node 201 is provided with a metadata management unit 2011 and a database 2012, the metadata management unit 2011 provides an interface through which an external device can query the database 2012. The database 2012 records a first corresponding relationship between a path of a file to be processed in the distributed file system and a block number of each block in the distributed file system, and a second corresponding relationship between the block number of each block and a network address of a data storage node where each block is located.

In the distributed storage system, files to be processed are stored in a database of data storage nodes in a fragment mode, wherein the fragments refer to different block data, each block data corresponds to a block number, a metadata management node records the corresponding relation between the paths of all files in the distributed storage system and the block numbers of all blocks, and each block number is respectively and correspondingly stored in the database of which data storage node.

The data storage node 101 is provided with a stream processing calculation unit 1011 and a database 1012, the database 1012 being recorded with block data and a correspondence relationship between block numbers and the block data, the stream processing calculation unit 1011 being able to access the database 1012 and acquire the corresponding block data from the database 1012 by the block numbers.

In fig. 1, the

data storage nodes

102 and 103 have a similar structure to the data storage node 101, except that the block data recorded in the database is different, and will not be described herein.

For example, the distributed file system may be implemented by Hadoop, the Database 2012, the Database 1012, the Database 1022 … …, and the Database 1032 may be implemented by Hbase (Hadoop Database), and the metadata management unit 2011 may be an hmaster process of the Hbase Database.

In the embodiment of the present invention, the client 301 and the stream processing management unit 302 may be disposed on the same host, and establish data connection with the metadata management node 201 and the

data storage nodes

101 and 102 … … 103, respectively, through a network.

In some examples, the client 301 and the stream processing management unit 302 may also be disposed on different hosts, which is not limited by the embodiment of the present invention.

For convenience of understanding, referring to fig. 2, fig. 2 is another schematic structural diagram of the stream processing system according to the embodiment of the present invention, as shown in fig. 2, a client 301 and a stream processing management unit 302 are disposed on a host 10, the host 10 further includes an operating system 303 and hardware 304, the hardware 304 is used for carrying the operation of the operating system 303, the hardware 304 includes a physical network card 3041, and the client 301 and the stream processing management unit 302 respectively run on the operating system 303 in the form of processes and access the network 50 through the physical network card 3041.

The metadata management node 201 includes a database 2012, a metadata management unit 2011, an operating system 2013, and hardware 2014, where the database 2012 and the metadata management unit 2011 run on the operating system 2013 in the form of processes, the hardware 2014 is used for carrying the running of the operating system 2013, the hardware 2014 includes a physical network card 20141, the physical network card 20141 is connected to the network 50, the metadata management unit 2011 provides an interface, and an external device can access the database 2012 through the interface.

Moreover, the data storage node 101 includes data 1012, a stream processing calculation unit 1011, an operating system 1013, and hardware 1014, where the database 1012 and the stream processing calculation unit 1011 run on the operating system 1013 in the form of processes, the hardware 1014 is used for carrying out the running of the operating system 2013, the hardware 1014 includes a physical network card 10141, the physical network card 10141 is connected to the network 50, and in the embodiment of the present invention, the stream processing calculation unit 1011 can access the database 1012.

The structure of

data storage nodes

102 and 103 is similar to data storage node 101 and will not be described in detail herein.

For example, communication between the stream processing management unit 302 and the client 301, the metadata management unit 2011, and each of the stream processing calculation units 101, 1021, … …, and 1031 may be realized by an RPC (Remote Procedure Call Protocol).

Based on the above architecture, in the embodiment of the present invention, a stream processing management unit 302 receives a stream processing task sent by a client 301, where the stream processing task includes a stream processing logic and a path of a file to be processed in a distributed file system; the stream processing management unit 302 acquires, from the metadata management node 201, the block number of each block corresponding to the path and the network address of the data storage node where the block number of each block is located; the stream processing management unit 302 sends the stream processing logic and the block numbers corresponding to the network addresses to the stream processing units of the corresponding data storage nodes, respectively; and the stream processing calculation unit acquires the block data corresponding to the received block number from the data storage node, and executes stream processing logic aiming at the block data corresponding to the received block number.

For further clarity, please refer to fig. 3, fig. 3 is a data interaction diagram of a stream processing method according to an embodiment of the present invention, and as shown in fig. 3, the stream processing method includes the following steps:

step 401: the stream processing management unit 302 receives a stream processing task sent by the client 301, where the stream processing task includes stream processing logic and a path of a file to be processed in the distributed file system.

For example, the client 301 may be a client process in the apache flash system, and the stream processing management unit 302 may be a jobmanager process in the apache flash system.

Step 402: the stream processing management unit 302 sends an inquiry request to the metadata management node 201, where the inquiry request carries a path of a file to be processed in the distributed file system.

For example, the query request includes an input parameter and a query instruction, and the stream processing management unit 302 takes a path of the file to be processed in the distributed file system as the input parameter and sends the input parameter and the control instruction to the interface provided by the metadata management unit 2011 of the metadata management node 201 and used for accessing the database 2012.

Step 403: the metadata management node 201 returns the block number of each block corresponding to the path and the network address of the data storage node corresponding to each block to the stream processing management unit 302 according to the query request.

As can be seen from the above, the database 2012 of the metadata management node 201 records a first corresponding relationship between a path of the file to be processed in the distributed file system and a block number of each block, and a second corresponding relationship between a block number of each block and a network address of the data storage node where each block is located, so that the stream processing management unit 302 of the metadata management node 201 obtains the block number of each block from the first corresponding relationship according to the path of the file to be processed in the distributed file system, and obtains the network address of the data storage node where each block is located from the second corresponding relationship according to the block number of each block.

Assuming that the block numbers acquired by the stream processing management unit 302 are block number 1 and block number 2, respectively, it should be noted that in practical applications, the block numbers include a plurality of block numbers, and for the sake of brief description, only two block numbers are taken as examples for description, and the stream processing management unit 302 queries the network address a of the data storage node 101 according to the block number 1 and queries the network address B of the data storage node 102 according to the block number 2.

Step 404: the stream processing management unit 302 sends the stream processing logic and the block number 1 to the stream processing calculation unit 1011.

In this step, after the stream processing management unit 302 inquires about the network address a of the data storage node 101 based on the block number 1, the block number 1 corresponding to the stream processing task and the network address a is sent to the stream processing calculation unit 1011 of the data storage node 101.

Step 405: the stream processing management unit 302 transmits the stream processing logic and block number 2 to the stream processing calculation unit 1021.

In this step, after querying the network address B of the data storage node 102 according to the block number 2, the stream processing management unit 302 sends the block number 2 corresponding to the stream processing task and the network address B to the stream processing calculation unit 1021 of the data storage node 102.

In steps 404 and 405, the stream processing calculation unit 1011 may be, for example, one task manager process in the apache flash system, and the stream processing calculation unit 1021 may be, for example, another task manager process in the apache flash system.

Step 406: the stream processing calculation unit 1011 acquires the block data corresponding to the received block number 1 from the data storage node 101 where it is located, and executes stream processing logic for the block data corresponding to the received block number 1.

In this step, the stream-processing calculating unit 1011 acquires the block data corresponding to the block number 1 received from the stream-processing managing unit 302 from the database 1012 of the data storage node 101 where it is located, and executes the stream processing logic with respect to the block data corresponding to the block number 1.

In some examples, the data storage node 101 is further provided with a data management unit for accessing the database 1012 to manage the block data in the database 1012.

For example, the distributed file system may be a Hadoop, a database of the Hadoop is implemented by an Hbase database, the metadata management unit 2011 is an Hmaster process of the Hbase database, the stream processing calculation unit is set as a program library, and the data management unit executes a function of the stream processing calculation unit by loading the program library.

Further, the data management unit is, for example, an hreigon server process of the Hbase database, the hreigon server process embeds a task manager process into the hreigon server process, the task manager process may be set as a library in a jar packet or so file format, and provides a start interface, and the hreigon server process may implement the function of the task manager process by running the start interface after loading the library.

In the embodiment of the present invention, the HReigonServer process that implements the function of the task manager process can locally read the block data of the database 1012, so the process of acquiring the block data can be prevented from being affected by the performance of the external network, and the HReigonServer process directly accesses the database 1012 in the process, i.e., directly reads the block data from the memory, so the speed of acquiring the block data is faster, and the efficiency of stream processing can be effectively improved.

In other examples, the data management unit and the stream processing computing unit 1011 may run in the operating system 1013 simultaneously, and the stream processing computing unit 1011 accesses the database 1012 through an interface provided by the data management unit, and in these examples, although the database 1012 is not directly accessed in-process by the HReigonServer process, the stream processing computing unit 1011 may locally access the database 1012 and may avoid the impact on the performance of the external network.

Step 407: the stream processing calculation unit 1021 acquires the block data corresponding to the received block number 2 from the data storage node 102 where it is located, and executes stream processing logic for the block data corresponding to the received block number 2.

Similar to the above step, in some examples, the data storage node 102 is provided with a data management unit for accessing the database 1022 to manage the block data. The distributed file system may be a Hadoop, a database of the Hadoop is implemented by an Hbase database, the metadata management unit 2011 is an Hmaster process of the Hbase database, the stream processing calculation unit 1011 is set as a program library, and the data management unit executes functions of the stream processing calculation unit by loading the program library.

In the embodiment of the present invention, the HReigonServer process that implements the function of the task manager process can locally read the block data of the database 1022, so the process of acquiring the block data can be prevented from being affected by the performance of the external network, and the HReigonServer process directly accesses the database 1022 in the process, so the speed of acquiring the block data is faster, and the efficiency of stream processing can be effectively improved.

In other examples, the data management unit and the stream processing computing unit 1021 may run on the operating system 1023 at the same time, and the stream processing computing unit 1021 accesses the database 1022 through an interface provided by the data management unit, in which case, although the database 1012 is not directly accessed in-process by the HReigonServer process, the stream processing computing unit 1021 may access the database 1022 locally, and may avoid the impact on external network performance.

Step 408: the stream processing calculation unit transmits a first processing result acquired by performing stream processing logic on the block data corresponding to the block number 1 to the stream processing management unit 302.

Step 409: the stream processing calculation unit transmits the second processing result acquired by performing the stream processing logic on the block data corresponding to the block number 2 to the stream processing management unit 302.

In summary, in the embodiments of the present invention, the stream processing calculating units are distributed on each data storage node, the stream processing management unit sends the stream processing task to the corresponding data storage node according to the path of the file to be processed, the stream processing calculating unit on the corresponding data storage node directly reads the block data corresponding to the file to be processed locally, and runs the stream processing logic on the read block data.

It is noted that in alternative embodiments of the present invention, stream processing system 90 may also be implemented based on the Storm, Spark, or Samza architectures.

Referring to fig. 4, fig. 4 is a schematic diagram of an apparatus structure of a flow processing management unit according to an embodiment of the present invention, and as shown in fig. 4, the flow processing management unit 302 includes:

the receiving module 601 is configured to receive a stream processing task sent by a client, where the stream processing task includes a stream processing logic and a path of a file to be processed in a distributed file system, the distributed file system includes a metadata management node and a plurality of data storage nodes, and each data storage node is provided with a stream processing computing unit;

the query module 602 is configured to obtain, from the metadata management node, a block number of each block corresponding to the path and a network address of a data storage node where each block is located;

a sending module 603, configured to send the stream processing logic and the block number of each block to the stream processing unit of the data storage node where each block is located.

Optionally, the receiving unit 601 is further configured to receive a processing result obtained by the execution stream processing logic sent by the stream processing computing unit.

Optionally, the metadata management node records a first corresponding relationship between a path of the file to be processed in the distributed file system and a block number of each block, and a second corresponding relationship between a block number of each block and a network address of the data storage node where each block is located, and the query module 602 is specifically configured to:

acquiring the block number of each block from the first corresponding relation according to the path of the file to be processed in the distributed file system;

and acquiring the network address of the data storage node where each block is located from the second corresponding relation according to the block number of each block.

Referring to fig. 5, fig. 5 is a schematic device structure diagram of a host according to an embodiment of the present invention, as shown in fig. 5, the host 50 includes a memory 502, a processor 501 and a bus 503, the memory 502 and the processor 501 are connected to the bus 503, the memory 502 stores program instructions, and the processor 501 executes the program instructions to implement the functions of the stream processing management unit 302 in the stream processing system.

It should be noted that any of the above-described device embodiments are merely schematic, where the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on multiple network units. Part or all of the processes can be selected according to actual needs to achieve the purpose of the scheme of the embodiment. In addition, in the drawings of the embodiment of the apparatus provided by the present invention, the connection relationship between the processes indicates that there is a communication connection between them, and may be specifically implemented as one or more communication buses or signal lines. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that the present invention may be implemented by software plus necessary general hardware, and may also be implemented by special hardware including special integrated circuits, special CPUs, special memories, special components and the like. Generally, functions performed by computer programs can be easily implemented by corresponding hardware, and specific hardware structures for implementing the same functions may be various, such as analog circuits, digital circuits, or dedicated circuits. However, the implementation of a software program is a more preferable embodiment for the present invention. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a readable storage medium, such as a floppy disk, a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk of a computer, and includes instructions for enabling a computer device (which may be a personal computer, a server, or a network device) to execute the methods according to the embodiments of the present invention.

It is clear to those skilled in the art that the specific working process of the above-described system, apparatus or unit may refer to the corresponding process in the foregoing method embodiments, and is not described herein again.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims

1. A stream processing method applied to a stream processing system including a stream processing management unit and a stream processing calculation unit, the method comprising:

and the stream processing calculation unit acquires the received block data corresponding to the block number from the data storage node, and executes the stream processing logic aiming at the received block data corresponding to the block number.

2. The method according to claim 1, wherein the data storage node is provided with a data management unit, the flow processing calculation unit is provided as a library, and the data management unit executes the function of the flow processing calculation unit by loading the library.

3. The method according to claim 1 or 2, characterized in that the method further comprises:

and the stream processing calculation unit sends the processing result obtained by executing the stream processing logic to the stream processing management unit.

4. The method according to any one of claims 1 to 3, wherein the metadata management node records a first correspondence between a path of the file to be processed in the distributed file system and a block number of each block, the stream processing management unit obtains, from the metadata management node, the block number of each block corresponding to the path, and the network address of the data storage node where each block is located specifically includes:

5. The method according to claim 4, wherein the metadata management node records a second correspondence between a block number of each block and a network address of a data storage node where each block is located, and the acquiring, by the stream processing management unit, the block number of each block corresponding to the path from the metadata management node, and the network address of the data storage node where the block number of each block is located specifically include:

and the stream processing management unit acquires the network address of the data storage node where each block is located from the second corresponding relation according to the block number of each block.

6. A stream processing system is characterized by comprising a stream processing management unit and a stream processing calculation unit,

the stream processing management unit is used for receiving a stream processing task sent by a client, wherein the stream processing task comprises stream processing logic and a path of a file to be processed in a distributed file system, the distributed file system comprises a metadata management node and a plurality of data storage nodes, and each data storage node is provided with a stream processing calculation unit;

the stream processing management unit is further configured to obtain, from the metadata management node, a block number of each block corresponding to the path and a network address of a data storage node where each block is located;

the stream processing management unit is further configured to send the stream processing logic and the block number of each block to the stream processing unit of the data storage node where each block is located;

and the stream processing calculation unit is used for acquiring the received block data corresponding to the block number from the data storage node, and executing the stream processing logic aiming at the received block data corresponding to the block number.

7. The system according to claim 6, wherein the data storage node is provided with a data management unit, the flow processing calculation unit is provided as a library, and the data management unit executes the function of the flow processing calculation unit by loading the library.

8. The system of claim 6,

the stream processing calculation unit is further configured to send a processing result obtained by executing the stream processing logic to the stream processing management unit.

9. The system according to claim 6, wherein the metadata management node records a first correspondence between a path of the file to be processed in the distributed file system and a block number of each block, and the stream processing management unit is specifically configured to:

10. The system according to claim 9, wherein the metadata management node records a second correspondence between each of the block numbers and a network address of a data storage node where each of the block numbers is located, and the stream processing management unit is specifically configured to:

11. A stream processing management unit, comprising:

the system comprises a receiving module, a processing module and a processing module, wherein the receiving module is used for receiving a stream processing task sent by a client, the stream processing task comprises stream processing logic and a path of a file to be processed in a distributed file system, the distributed file system comprises a metadata management node and a plurality of data storage nodes, and each data storage node is provided with a stream processing computing unit;

the query module is used for acquiring the block number of each block corresponding to the path from the metadata management node and the network address of the data storage node where each block is located;

and the sending module is used for respectively sending the stream processing logic and the block numbers of the blocks to the stream processing units of the data storage nodes where the blocks are located.

12. A host comprising a memory, a processor, and a bus, the memory, the processor, and the bus coupled to the memory, the memory storing program instructions, the processor executing the program instructions to cause the host to perform the steps of:

receiving a stream processing task sent by a client, wherein the stream processing task comprises stream processing logic and a path of a file to be processed in a distributed file system, the distributed file system comprises a metadata management node and a plurality of data storage nodes, and each data storage node is provided with a stream processing computing unit;

acquiring the block number of each block corresponding to the path and the network address of the data storage node where each block is located from the metadata management node;

and respectively sending the stream processing logic and the block number of each block to a stream processing unit of the data storage node where each block is located.