CN108696559A - Method for stream processing and device - Google Patents

Method for stream processing and device Download PDF

Info

Publication number
CN108696559A
CN108696559A CN201710233425.0A CN201710233425A CN108696559A CN 108696559 A CN108696559 A CN 108696559A CN 201710233425 A CN201710233425 A CN 201710233425A CN 108696559 A CN108696559 A CN 108696559A
Authority
CN
China
Prior art keywords
stream process
piece
stream
block number
data memory
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710233425.0A
Other languages
Chinese (zh)
Other versions
CN108696559B (en
Inventor
曹俊
胡斐然
林铭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Cloud Computing Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN201710233425.0A priority Critical patent/CN108696559B/en
Priority to PCT/CN2018/082641 priority patent/WO2018188607A1/en
Publication of CN108696559A publication Critical patent/CN108696559A/en
Application granted granted Critical
Publication of CN108696559B publication Critical patent/CN108696559B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/56Provisioning of proxy services
    • H04L67/561Adding application-functional data or data for application control, e.g. adding metadata
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/60Scheduling or organising the servicing of application requests, e.g. requests for application data transmissions using the analysis and optimisation of the required network resources

Abstract

The embodiment of the invention discloses a kind of method for stream processing and device, this method includes:Stream process administrative unit receives the stream process task that client is sent;Stream process administrative unit obtains the network address of corresponding each piece of the block number in path of pending file and the data memory node at each piece of place from metadata management node;Stream process logic and each piece of block number are sent to the stream processing unit of the data memory node where each piece by stream process administrative unit respectively;Stream process computing unit obtains the corresponding block number evidence of block number received from the data memory node at place, for the corresponding block number of block number received according to execution stream process logic.In the above manner, can overcome because the network transfer speeds between stream processing system and data memory node are relatively low influence the speed of stream process the technical issues of.

Description

Method for stream processing and device
Technical field
The present invention relates to information technology field, more particularly to a kind of method for stream processing and device.
Background technology
Workflow (Work flow) how front and back is organized together between each business in workflow and workflow Logic rules it is abstract, summarize, description.The stream concept that works originates from organization of production and Field of Office Automation, is to be directed to day Often with fixed routine activity and a concept proposing in work, it is therefore an objective to by by job analysis at defining good flow Or role, it executes these flows according to certain rule and process and it is monitored, reach and improve working efficiency, more preferable Control process, enhancing to the servicing of client, effective management business flow the purpose of.Workflow modeling is being counted workflow It is indicated with appropriate model in calculation machine and to the calculation that plays tricks in fact.By workflow modeling, workflow can pass through workflow System manages.
The major function of stream processing system is to go to definition, execution and management work stream, association by the support of computer technology Information exchange during tune workflow execution between flow and between group member.Stream processing system is usually by workflow Design tool, Work Process Management tool composition, workflow design tool design the definition of work flow of oneself, work for user Make workflow management tool to be in charge of the management the execution of flow.In the Workflow system course of work, workflow example includes One or more business (Task), each business agent need a certain work carried out.
Apache Storm are typical stream processing systems in the prior art, by Master-Slave (MS master-slave) framework group At Nimbus is host process, and Supervisor is the slave process of operation business.Stream processing system Storm and distributed field system Network connection is stood in construction in a systematic way, and distributed file system storage needs the data that stream processing system Storm is handled, distributed document System includes Master Server (master server) and Data Server (data server), and Master Server are first numbers According to management node, the distribution situation of management data block, Data Server are data memory node points, store data block data, Storm is arranged with data memory node point on different server.
In the stream process operation of Storm, Storm needs the number of progress stream process firstly the need of being obtained from data server According to.Specifically, data server provides data-query interfaces, and Storm is passed through by network inputs parameter to data-query interfaces Network obtains data from data server, and then the data got are loaded into Supervisor.
Since in the prior art, stream processing system needs to obtain data from data memory node by network, therefore obtains The speed of access evidence is limited by network performance, the limited performance of entire stream process can be caused in network, in stream processing system and number When relatively low according to the network transfer speeds between memory node, the speed of stream process can be greatly influenced.
Invention content
To solve problem of the prior art, a kind of method for stream processing of offer of the embodiment of the present invention and device, can overcome because of stream Network transfer speeds between processing system and data memory node are relatively low and the technical issues of influencing the speed of stream process.
In a first aspect, the embodiment of the present invention provides a kind of method for stream processing, this method is applied to stream processing system, stream process System includes stream process administrative unit and stream process computing unit, this method include:
Stream process administrative unit receives the stream process task that client is sent, and wherein stream process task includes stream process logic With pending file in the path of distributed file system, distributed file system includes metadata management node and multiple data Memory node, each data memory node are provided with stream process computing unit;
Stream process administrative unit obtains corresponding each piece of the block number in path of pending file from metadata management node, And the network address of the data memory node where each piece;
Data where stream process logic and each piece of block number are sent to each piece by stream process administrative unit respectively are deposited Store up the stream processing unit of node;
Stream process computing unit obtains the corresponding block number evidence of block number that receives from the data memory node at place, for connecing The corresponding block number of block number received is according to execution stream process logic.
Since the distribution of stream process computing unit is arranged on each data memory node the embodiment of the present invention, and by stream process Stream process task is sent to corresponding data memory node by administrative unit according to the path of pending file, by corresponding data Stream process computing unit on memory node is directly locally reading the corresponding block number evidence of pending file, and the block that will be read Data run stream process logic can overcome since stream process computing unit locally reads pending file because of stream process system Unite it is relatively low the network transfer speeds between data memory node and the technical issues of influence the speed of stream process.
Also, since pending file is broken up into block number evidence, executed parallel in different stream process computing units respectively Stream process logic, therefore stream process speed can be further speeded up, improve treatment effeciency.
In an implementation of the embodiment of the present invention, data memory node is provided with Data Management Unit, stream process Computing unit is set as program library, and Data Management Unit executes the function of stream process computing unit by loading procedure library.
Due to by stream process computing unit by program lab setting in Data Management Unit, and Data Management Unit can be straight It connects and reads block number evidence, block number can be read after in Data Management Unit, you can execute stream process logic, stream process speed can be accelerated Degree.
In another realization method of the embodiment of the present invention, this method further includes:
Stream process computing unit is sent to stream process administrative unit by the handling result that stream process logic obtains is executed.
In another realization method of the embodiment of the present invention, metadata management nodes records file to be handled is being distributed First correspondence in the path of formula file system and each piece of block number, stream process administrative unit are obtained from metadata management node The network address of the data memory node where corresponding each piece of the block number in path and each piece of block number is taken specifically to wrap It includes:
Stream process administrative unit obtains in the path of distributed file system from the first correspondence according to pending file Each piece of block number.
In another realization method of the embodiment of the present invention, metadata management nodes records have each piece of block number with it is each Second correspondence of the network address of the data memory node where the block number of a block, stream process administrative unit is from metadata pipe Manage the network address of the data memory node where corresponding each piece of the block number of node acquisition approach and each piece of block number It specifically includes:
Stream process administrative unit obtains the data memory node at each piece number place according to each piece number from the second correspondence Network address.
Second aspect, the embodiment of the present invention provide a kind of stream processing system, including stream process administrative unit and stream process Computing unit,
Stream process administrative unit, the stream process task for receiving client transmission, wherein stream process task includes at stream Logic and pending file are managed in the path of distributed file system, distributed file system includes metadata management node and more A data memory node, each data memory node are provided with stream process computing unit;
Stream process administrative unit is additionally operable to from metadata management node acquisition approach corresponding each piece number and each piece number The network address of the data memory node at place;
Stream process administrative unit is additionally operable to respectively be sent to stream process logic and corresponding with each network address piece number pair The stream processing unit for the data memory node answered;
Stream process computing unit, for obtaining the corresponding block number evidence of block number received from the data memory node at place, For the corresponding block number of block number received according to execution stream process logic.
In an implementation of the embodiment of the present invention, data memory node is provided with Data Management Unit, stream process Computing unit is set as program library, and Data Management Unit executes the function of stream process computing unit by loading procedure library.
In another realization method of the embodiment of the present invention, stream process computing unit, stream process will be executed by, which being additionally operable to, patrols It collects the handling result obtained and is sent to stream process administrative unit.
In another realization method of the embodiment of the present invention, metadata management nodes records file to be handled is being distributed First correspondence in the path of formula file system and each piece of block number, stream process administrative unit are specifically used for:
Each piece of block number is obtained in the path of distributed file system from the first correspondence according to pending file.
In another realization method of the embodiment of the present invention, metadata management nodes records have each piece of block number with it is each Second correspondence of the network address of the data memory node where the block number of a block, stream process administrative unit are specifically used for:
The network address of the data memory node at each piece number place is obtained from the second correspondence according to each piece of block number.
The third aspect, the embodiment of the present invention provide a kind of stream process administrative unit, execute the stream in above-mentioned stream processing system Handle the function of administrative unit.
Fourth aspect, the embodiment of the present invention provide a kind of host, including memory, processor and bus, memory, processing Device is connect with bus, and memory has program stored therein instruction, and processor is executed program instructions to realize in above-mentioned stream processing system The function of stream process administrative unit.
Description of the drawings
In order to illustrate the technical solution of the embodiments of the present invention more clearly, will make below to required in the embodiment of the present invention Attached drawing is briefly described, it should be apparent that, drawings described below is only some embodiments of the present invention, for For those of ordinary skill in the art, without creative efforts, other are can also be obtained according to these attached drawings Attached drawing.
Fig. 1 is the structural schematic diagram of stream processing system according to the ... of the embodiment of the present invention;
Fig. 2 is another structural schematic diagram of stream processing system according to the ... of the embodiment of the present invention;
Fig. 3 is the data interaction figure of method for stream processing according to the ... of the embodiment of the present invention;
Fig. 4 is the apparatus structure schematic diagram of stream processing system according to the ... of the embodiment of the present invention;
Fig. 5 is the apparatus structure schematic diagram of host according to the ... of the embodiment of the present invention.
Specific implementation mode
It is stream processing system according to the ... of the embodiment of the present invention and distributed file system and client referring firstly to Fig. 1, Fig. 1 The connection diagram at end, as shown in Figure 1, stream processing system, which includes stream process administrative unit 302 and multiple stream process, calculates list Member 1011,1021 ... and 1031, distributed file system includes metadata management node 201 and multiple data memory nodes ... and 103 101,102.
In embodiments of the present invention, client 301 is connect with stream process administrative unit 302, and stream process administrative unit 302 is divided Not with metadata management node 201 and multiple data memory nodes 101,102 ..., 103 connect.
Client 301 is used to receive the stream process operation of user's submission, and in embodiments of the present invention, user submits stream process Pending data is indicated when operation in the path of distributed file system, and provides which kind of processing is carried out to pending data.
The pending file of stream process task can be for example URL (Universal in the path of distributed file system The abbreviation of Resource Locator, uniform resource locator), URL is that the storage of distributed file system identifies, and passes through URL Corresponding each piece of the block number of pending file can be found in metadata management node 201.
Client 301 generates stream process task according to the stream process operation that user submits, and stream process task includes stream process In the path of distributed file system, what wherein stream process logical definition carries out to pending data for logic and pending data Kind processing, for example, stream process logic could dictate that searches for anomalous event in pending data.
Client 301 to stream process administrative unit 302 send stream process task, stream process administrative unit 302 according to stream at Reason task is scheduled, and selects stream process computing unit to obtain pending file from distributed file system, and patrolled with stream process It collects and pending file is handled.
For example, stream processing system can be based on the realization of apache flink frameworks, and client 301 is apache flink Client (client) process, stream process administrative unit 302 be apache flink job manager (work manager) Process, stream process computing unit are task manager (job manager) process of apache flink.
Metadata management node 201 is provided with metadata management unit 2011 and database 2012, metadata management unit 2011 provide interface, and external equipment can pass through interface polls database 2012.Database 2012 has recorded distributed file system In pending file in the first correspondence of the path of distributed file system and each piece of block number and each piece Second correspondence of block number and the network address of the data memory node where each piece.
In distributed memory system, pending file is stored in form of chips in the database of data memory node, Wherein fragment refers to different block number evidences, and each block number is according to one piece number corresponding, metadata management nodes records distributed memory system Which number correspondence between the path of middle All Files and each piece of block number and each piece number corresponding are respectively stored in According to the database of memory node.
Data memory node 101 is provided with stream process computing unit 1011 and database 1012, and the record of database 1012 has Block number is according to this and the correspondence between block number and block number evidence, 1011 accessible database 1012 of stream process computing unit pass through Block number obtains corresponding block number evidence from database 1012.
In Fig. 1, data memory node 102 and 103 has similar structures with data memory node 101, and difference lies in certainly The block number of body data-base recording is not repeated according to differing in this.
For example, distributed file system can be realized by Hadoop, database 2012, database 1012, database 1022 ... and database 1032 can be realized by Hbase (Hadoop Database, Hadoop database), metadata pipe Manage the hmaster processes that unit 2011 can be Hbase databases.
In embodiments of the present invention, client 301 and stream process administrative unit 302 may be provided on same host, and lead to It crosses network and metadata management node 201 and data memory node 101,102 ... 103 establish data connection respectively.
In some instances, client 301 and stream process administrative unit 302 may also be arranged on different hosts, the present invention Embodiment is not construed as limiting this.
In order to make it easy to understand, reference can be made to another structure that Fig. 2, Fig. 2 are stream processing systems according to the ... of the embodiment of the present invention is shown It is intended to, as shown in Fig. 2, client 301 and the setting of stream process administrative unit 302 are on host 10, host 10 further includes operation system System 303 and hardware 304, hardware 304 are used to carry the operation of operating system 303, and hardware 304 includes physical network card 3041, client End 301 and stream process administrative unit 302 are operated in the form of process respectively in operating system 303, and pass through physical network card 3041 access network 50.
Also, metadata management node 201 includes database 2012, metadata management unit 2011, operating system 2013 With hardware 2014, database 2012 and metadata management unit 2011 operate in operating system 2013 in the form of process respectively On, hardware 2014 is used to carry the operation of operating system 2013, and hardware 2014 includes physical network card 20141, physical network card 20141 Network 50 is accessed, metadata management unit 2011 provides interface, and external equipment can access database 2012 by interface.
Also, data memory node 101 include data 1012, stream process computing unit 1011, operating system 1013 and Hardware 1014, database 1012 and stream process computing unit 1011 are operated in the form of process respectively in operating system 1013, Hardware 1014 is used to carry the operation of operating system 2013, and hardware 1014 includes physical network card 10141, and physical network card 10141 connects Enter network 50, in embodiments of the present invention, 1011 accessible database 1012 of stream process computing unit.
The structure of data memory node 102 and 103 is similar with data memory node 101, is not repeated in this.
For example, stream process administrative unit 302 and client 301, metadata management unit 2011 and each stream process meter Calculate unit 101,1021 ... and between 1031 can by RPC (Remote Procedure Call Protocol, remotely Invocation of procedure agreement) realize communication.
Based on the above framework, a kind of method for stream processing is provided in the embodiment of the present invention, stream process administrative unit 302 receives visitor The stream process task that family end 301 is sent, wherein stream process task includes stream process logic and pending file in distributed document The path of system;Stream process administrative unit 302 from corresponding each piece of the block number of 201 acquisition approach of metadata management node, with And the network address of the data memory node where each piece of block number;Stream process administrative unit 302 is respectively by stream process logic Corresponding with each network address piece number stream processing unit for being sent to corresponding data memory node;Stream process computing unit from The data memory node at place obtains the corresponding block number evidence of block number received, is held for the corresponding block number evidence of block number received Row stream process logic.
Since the distribution of stream process computing unit is arranged on each data memory node the embodiment of the present invention, and by stream process Stream process task is sent to corresponding data memory node by administrative unit according to the path of pending file, by corresponding data Stream process computing unit on memory node is directly locally reading the corresponding block number evidence of pending file, and the block that will be read Data run stream process logic can overcome since stream process computing unit locally reads pending file because of stream process system Unite it is relatively low the network transfer speeds between data memory node and the technical issues of influence the speed of stream process.
In order to further clearly illustrate, refer to Fig. 3 below, Fig. 3 is method for stream processing according to the ... of the embodiment of the present invention Data interaction figure, as shown in figure 3, method for stream processing includes the following steps:
Step 401:Stream process administrative unit 302 receives the stream process task that client 301 is sent, wherein stream process task Including stream process logic and pending file in the path of distributed file system.
For example, client 301 can be the client processes in apache flink systems, stream process administrative unit 302 can be the job manager processes in apache flink systems.
Step 402:Stream process administrative unit 302 sends inquiry request to metadata management node 201, wherein inquiry is asked It asks and carries file to be handled in the path of distributed file system.
For example, inquiry request includes input parameter and inquiry instruction, and stream process administrative unit 302 is with pending file It is input parameter in the path of distributed file system, and input parameter and control instruction is sent to metadata management node In the interface for accessing database 2012 that 201 metadata management unit 2011 provides.
Step 403:Metadata management node 201 is according to corresponding each piece of the block number of inquiry request return path and respectively The network address of the corresponding data memory node of a block is to stream process administrative unit 302.
It holds it is found that the database 2012 of metadata management node 201 records file to be handled in distributed field system The path of system and the first correspondence of each piece of block number and each piece of block number and the data memory node where each piece Network address the second correspondence, therefore, the stream process administrative unit 302 of metadata management node 201 is according to pending File obtains each piece of block number in the path of distributed file system from the first correspondence, and according to each piece of block number from Second correspondence obtains the network address of the data memory node where each piece.
Assuming that the block number that stream process administrative unit 302 obtains is respectively block number 1 and block number 2, it is notable that in reality In, block number includes multiple, is brief description in this, is only illustrated by taking two blocks number as an example, stream process administrative unit 302 The network address A that data memory node 101 is inquired according to block number 1, the net of data memory node 102 is inquired according to block number 2 Network address B.
Step 404:Stream process logic and block number 1 are sent to stream process computing unit 1011 by stream process administrative unit 302.
In this step, stream process administrative unit 302 inquires the network address A of data memory node 101 according to block number 1 Later, by corresponding piece number 1 stream process computing unit for being sent to data memory node 101 of stream process task and network address A 1011。
Step 405:Stream process logic and block number 2 are sent to stream process computing unit 1021 by stream process administrative unit 302.
In this step, stream process administrative unit 302 inquires the network address B of data memory node 102 according to block number 2 Later, by corresponding piece number 2 stream process computing unit for being sent to data memory node 102 of stream process task and network address B 1021。
In step 404 and 405, stream process computing unit 1011 may be, for example, one in apache flink systems Task manager processes, stream process computing unit 1021 may be, for example, another task in apache flink systems Manager processes.
Step 406:Stream process computing unit 1011 obtains 1 pair of the block number received from the data memory node 101 at place The block number evidence answered, for the 1 corresponding block number of block number received according to execution stream process logic.
In this step, stream process computing unit 1011 is obtained from the database 1012 of the data memory node 101 at place The 1 corresponding block number evidence of block number received from stream process administrative unit 302, and for 1 corresponding block number of block number according to execute stream at Manage logic.
In some instances, data memory node 101 is further provided with Data Management Unit, and Data Management Unit is used for Database 1012 is accessed to manage the block number evidence in database 1012.
For example, distributed file system can be Hadoop, the database of Hadoop by Hbase database realizings, Metadata management unit 2011 is the Hmaster processes of Hbase databases, and stream process computing unit is set as program library, data Administrative unit executes the function of stream process computing unit by loading procedure library.
Further, Data Management Unit is, for example, the HReigonServer processes of Hbase databases, HReigonServer Task manager processes are embedded into HReigonServer processes by process, and task manager processes may be configured as format It is the program library of jar packets or so files, and startup interface is provided, HReigonServer processes passes through after loading procedure library Operation starts the function that task manager processes can be realized in interface.
Due in embodiments of the present invention, realizing that the HReigonServer processes of the function of task manager processes can In the block number evidence of local reading database 1012, therefore the process for obtaining block number evidence can avoid the shadow by external network performance It rings, and since HReigonServer processes directly access the database 1012 in process, i.e., directly reads block number evidence from memory, Therefore the speed for obtaining block number evidence faster, can effectively improve the efficiency of stream process.
In other examples, Data Management Unit can operate in operating system simultaneously with stream process computing unit 1011 1013, the interface access data library 1012 that stream process computing unit 1011 is provided by Data Management Unit, in those examples, Although not being to directly access the database 1012 in process by HReigonServer processes, stream process computing unit 1011 The influence of external network performance can be also can avoid in local IP access database 1012.
Step 407:Stream process computing unit 1021 obtains 2 pairs of the block number received from the data memory node 102 at place The block number evidence answered, for the 2 corresponding block number of block number received according to execution stream process logic.
Similar with previous step, in some instances, data memory node 102 is provided with Data Management Unit, data pipe Reason unit is for accessing database 1022 with management block data.Distributed file system can be Hadoop, the database of Hadoop By Hbase database realizings, metadata management unit 2011 is the Hmaster processes of Hbase databases, and stream process calculates single Member 1011 is set as program library, and Data Management Unit executes the function of stream process computing unit by loading procedure library.
Further, Data Management Unit is, for example, the HReigonServer processes of Hbase databases, HReigonServer Task manager processes are embedded into HReigonServer processes by process, and task manager processes may be configured as format It is the program library of jar packets or so files, and startup interface is provided, HReigonServer processes passes through after loading procedure library Operation starts the function that task manager processes can be realized in interface.
Due in embodiments of the present invention, realizing that the HReigonServer processes of the function of task manager processes can In the block number evidence of local reading database 1022, therefore the process for obtaining block number evidence can avoid the shadow by external network performance It rings, and since HReigonServer processes directly access the database 1022 in process, obtains the speed of block number evidence more Soon, the efficiency of stream process can be effectively improved.
In other examples, Data Management Unit can operate in operating system simultaneously with stream process computing unit 1021 1023, the interface access data library 1022 that stream process computing unit 1021 is provided by Data Management Unit, in those examples, Although not being to directly access the database 1012 in process by HReigonServer processes, stream process computing unit 1021 The influence of external network performance can be also can avoid in local IP access database 1022.
Step 408:Stream process computing unit by 1 corresponding block number of block number according to execute stream process logic obtain first at Reason result is sent to stream process administrative unit 302.
Step 409:Stream process computing unit by 2 corresponding block number of block number according to execute stream process logic obtain second at Reason result is sent to stream process administrative unit 302.
To sum up, the distribution of stream process computing unit is arranged on each data memory node due to the embodiment of the present invention, and by Stream process task is sent to corresponding data memory node by stream process administrative unit according to the path of pending file, by corresponding to Data memory node on stream process computing unit directly locally reading the corresponding block number evidence of pending file, and will read The block data run stream process logic arrived can overcome since stream process computing unit locally reads pending file because of stream Network transfer speeds between processing system and data memory node are relatively low and the technical issues of influencing the speed of stream process.
Also, since pending file is broken up into block number evidence, executed parallel in different stream process computing units respectively Stream process logic, therefore stream process speed can be further speeded up, improve treatment effeciency.
It is worth noting that, in the alternative embodiment of the present invention, stream processing system 90 is also based on Storm, Spark Or Samza frameworks are realized.
Fig. 4 is referred to below, Fig. 4 is the apparatus structure schematic diagram of stream process administrative unit according to the ... of the embodiment of the present invention, As shown in figure 4, stream process administrative unit 302 includes:
Receiving module 601, the stream process task for receiving client transmission, wherein stream process task includes that stream process is patrolled It collects and pending file is in the path of distributed file system, distributed file system includes metadata management node and multiple numbers According to memory node, each data memory node is provided with stream process computing unit;
Enquiry module 602, for from corresponding each piece of the block number of metadata management node acquisition approach and each piece The network address of the data memory node at place;
Sending module 603, the data for being respectively sent to stream process logic and each piece of block number where each piece The stream processing unit of memory node.
Optionally, receiving unit 601 is additionally operable to receive what the execution stream process logic that stream process computing unit is sent obtained Handling result.
Optionally, metadata management nodes records file to be handled is the path of distributed file system and each piece First correspondence of block number, each piece of block number are second corresponding with the network address of the data memory node where each piece Relationship, enquiry module 602 are specifically used for:
Each piece of block number is obtained in the path of distributed file system from the first correspondence according to pending file;
The network address of the data memory node at each piece of place is obtained from the second correspondence according to each piece of block number.
Fig. 5 is referred to below, Fig. 5 is the apparatus structure schematic diagram of host according to the ... of the embodiment of the present invention, as shown in figure 5, Host 50 includes memory 502, processor 501 and bus 503, and memory 502, processor 501 are connect with bus 503, is stored Device 502 has program stored therein instruction, and processor 501 is executed program instructions to realize that the management of the stream process in above-mentioned stream processing system is single The function of member 302.
Since the distribution of stream process computing unit is arranged on each data memory node the embodiment of the present invention, and by stream process Stream process task is sent to corresponding data memory node by administrative unit according to the path of pending file, by corresponding data Stream process computing unit on memory node is directly locally reading the corresponding block number evidence of pending file, and the block that will be read Data run stream process logic can overcome since stream process computing unit locally reads pending file because of stream process system Unite it is relatively low the network transfer speeds between data memory node and the technical issues of influence the speed of stream process.
Also, since pending file is broken up into block number evidence, executed parallel in different stream process computing units respectively Stream process logic, therefore stream process speed can be further speeded up, improve treatment effeciency.
It should be noted that any device embodiment described above is all only schematical, wherein described as separation The unit of part description may or may not be physically separated, the component shown as unit can be or It can not be physical unit, you can be located at a place, or may be distributed over multiple network units.It can be according to reality Border needs to select some or all of process therein to achieve the purpose of the solution of this embodiment.In addition, provided by the invention In device embodiment attached drawing, the connection relation between process indicates there is communication connection between them, specifically can be implemented as one Item or a plurality of communication bus or signal wire.Those of ordinary skill in the art are without creative efforts, you can with Understand and implements.
Through the above description of the embodiments, it is apparent to those skilled in the art that the present invention can borrow Help software that the mode of required common hardware is added to realize, naturally it is also possible to by specialized hardware include application-specific integrated circuit, specially It is realized with CPU, private memory, special components and parts etc..Under normal circumstances, all functions of being completed by computer program can It is easily realized with corresponding hardware, moreover, for realizing that the particular hardware structure of same function can also be a variety of more Sample, such as analog circuit, digital circuit or special circuit etc..But it is more for the purpose of the present invention in the case of software program it is real It is now more preferably embodiment.Based on this understanding, technical scheme of the present invention substantially in other words makes the prior art The part of contribution can be expressed in the form of software products, which is stored in the storage medium that can be read In, such as the floppy disk of computer, USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory Device (RAM, Random Access Memory), magnetic disc or CD etc., including some instructions are with so that a computer is set Standby (can be personal computer, server or the network equipment etc.) executes the method described in each embodiment of the present invention.
It is apparent to those skilled in the art that the specific work of the system of foregoing description, device or unit Make process, can refer to corresponding processes in the foregoing method embodiment, details are not described herein.
The above description is merely a specific embodiment, but scope of protection of the present invention is not limited thereto, any Those familiar with the art in the technical scope disclosed by the present invention, can easily think of the change or the replacement, and should all contain Lid is within protection scope of the present invention.Therefore, protection scope of the present invention should be based on the protection scope of the described claims.

Claims (12)

1. a kind of method for stream processing, which is characterized in that the method is applied to stream processing system, and the stream processing system includes stream Administrative unit and stream process computing unit are handled, the method includes:
The stream process administrative unit receives the stream process task that client is sent, wherein the stream process task includes stream process Logic and pending file in the path of distributed file system, the distributed file system include metadata management node and Multiple data memory nodes, each data memory node are provided with stream process computing unit;
The path that the stream process administrative unit obtains the pending file from the metadata management node is corresponding each The network address of the block number of block and the data memory node at each piece of place;
The stream process logic and each piece of the block number are sent to where each piece by the stream process administrative unit respectively Data memory node stream processing unit;
The stream process computing unit obtains the described piece number corresponding block number evidence received, needle from the data memory node at place To the corresponding block number of described piece received number according to the execution stream process logic.
2. according to the method described in claim 1, it is characterized in that, the data memory node is provided with Data Management Unit, The stream process computing unit is set as program library, and the Data Management Unit is executed by loading described program library at the stream Manage the function of computing unit.
3. method according to claim 1 or 2, which is characterized in that the method further includes:
The stream process computing unit is sent to the stream process management by the handling result that the stream process logic obtains is executed Unit.
4. method according to any one of claims 1 to 3, which is characterized in that the metadata management nodes records are Pending file is stated in first correspondence in the path of the distributed file system and each piece of block number, the stream process Administrative unit obtains corresponding each piece of the block number in the path and the number at each piece of place from the metadata management node It is specifically included according to the network address of memory node:
The stream process administrative unit is according to the pending file in the path of the distributed file system from described first Correspondence obtains each piece of block number.
5. according to the method described in claim 4, it is characterized in that, the metadata management nodes records have each piece of block number With the second correspondence of the network address of the data memory node where each piece, the stream process administrative unit is from the member Data management node obtains the data storage where the block number of corresponding each piece of the block number in the path and each a block The network address of node specifically includes:
The stream process administrative unit obtains the data where each piece according to each piece of block number from second correspondence The network address of memory node.
6. a kind of stream processing system, which is characterized in that including stream process administrative unit and stream process computing unit,
The stream process administrative unit, the stream process task for receiving client transmission, wherein the stream process task includes In the path of distributed file system, the distributed file system includes metadata management for stream process logic and pending file Node and multiple data memory nodes, each data memory node are provided with stream process computing unit;
The stream process administrative unit is additionally operable to obtain corresponding each piece of the block in the path from the metadata management node Number and each piece where data memory node network address;
The stream process administrative unit is additionally operable to respectively be sent to the stream process logic and each piece of the block number each The stream processing unit of data memory node where block;
The stream process computing unit, for obtaining the described piece number corresponding block number received from the data memory node at place According to for the corresponding block number of described piece received number according to the execution stream process logic.
7. system according to claim 6, which is characterized in that the data memory node is provided with Data Management Unit, The stream process computing unit is set as program library, and the Data Management Unit is executed by loading described program library at the stream Manage the function of computing unit.
8. system according to claim 6, which is characterized in that
The stream process computing unit, the handling result that the stream process logic obtains will be executed by, which being additionally operable to, is sent at the stream Manage administrative unit.
9. system according to claim 6, which is characterized in that the metadata management nodes records have the pending text Part is in first correspondence in the path of the distributed file system and each piece of block number, the stream process administrative unit tool Body is used for:
According to the pending file each piece is obtained from first correspondence in the path of the distributed file system Block number.
10. system according to claim 9, which is characterized in that the metadata management nodes records have each described piece number With the second correspondence of the network address of the data memory node where each described piece number, the stream process administrative unit is specific For:
The network address of the data memory node at each piece of place is obtained from second correspondence according to each piece of block number.
11. a kind of stream process administrative unit, which is characterized in that including:
Receiving module, the stream process task for receiving client transmission, wherein the stream process task includes stream process logic With pending file in the path of distributed file system, the distributed file system includes metadata management node and multiple Data memory node, each data memory node are provided with stream process computing unit;
Enquiry module, for obtaining corresponding each piece of the block number in the path and each from the metadata management node The network address of data memory node where block;
Sending module, the data for being respectively sent to the stream process logic and each piece of the block number where each piece The stream processing unit of memory node.
12. a kind of host, which is characterized in that including memory, processor and bus, the memory, the processor and institute Bus connection is stated, the memory has program stored therein instruction, and the processor executes described program and instructs so that the host Execute following step:
The stream process task that client is sent is received, is existed wherein the stream process task includes stream process logic and pending file The path of distributed file system, the distributed file system include metadata management node and multiple data memory nodes, Each data memory node is provided with stream process computing unit;
The data that corresponding each piece of the block number in the path and each piece of place are obtained from the metadata management node are deposited Store up the network address of node;
The stream process logic and each piece of the block number are sent to the stream of the data memory node where each piece respectively Processing unit.
CN201710233425.0A 2017-04-11 2017-04-11 Stream processing method and device Active CN108696559B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201710233425.0A CN108696559B (en) 2017-04-11 2017-04-11 Stream processing method and device
PCT/CN2018/082641 WO2018188607A1 (en) 2017-04-11 2018-04-11 Stream processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710233425.0A CN108696559B (en) 2017-04-11 2017-04-11 Stream processing method and device

Publications (2)

Publication Number Publication Date
CN108696559A true CN108696559A (en) 2018-10-23
CN108696559B CN108696559B (en) 2021-08-20

Family

ID=63792265

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710233425.0A Active CN108696559B (en) 2017-04-11 2017-04-11 Stream processing method and device

Country Status (2)

Country Link
CN (1) CN108696559B (en)
WO (1) WO2018188607A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110046131A (en) * 2019-01-23 2019-07-23 阿里巴巴集团控股有限公司 The Stream Processing method, apparatus and distributed file system HDFS of data
CN111290744A (en) * 2020-01-22 2020-06-16 北京百度网讯科技有限公司 Stream computing job processing method, stream computing system and electronic device
CN111435938A (en) * 2019-01-14 2020-07-21 阿里巴巴集团控股有限公司 Data request processing method, device and equipment

Citations (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1886739A (en) * 2003-10-27 2006-12-27 特博数据实验室公司 Distributed memory type information processing system
CN101089819A (en) * 2006-06-13 2007-12-19 国际商业机器公司 Method for dynamic stationary flow processing system and upstream processing node
CN101741885A (en) * 2008-11-19 2010-06-16 珠海市西山居软件有限公司 Distributed system and method for processing task flow thereof
US20110313934A1 (en) * 2010-06-21 2011-12-22 Craig Ronald Van Roy System and Method for Configuring Workflow Templates
US8150889B1 (en) * 2008-08-28 2012-04-03 Amazon Technologies, Inc. Parallel processing framework
CN102456185A (en) * 2010-10-29 2012-05-16 金蝶软件(中国)有限公司 Distributed workflow processing method and distributed workflow engine system
CN102467411A (en) * 2010-11-19 2012-05-23 金蝶软件(中国)有限公司 Workflow processing and workflow agent method, device and system
CN102542367A (en) * 2010-12-10 2012-07-04 金蝶软件(中国)有限公司 Cloud computing network workflow processing method, device and system based on domain model
US20130086116A1 (en) * 2011-10-04 2013-04-04 International Business Machines Corporation Declarative specification of data integraton workflows for execution on parallel processing platforms
CN103309867A (en) * 2012-03-09 2013-09-18 句容智恒安全设备有限公司 Web data mining system on basis of Hadoop platform
US20130253977A1 (en) * 2012-03-23 2013-09-26 Commvault Systems, Inc. Automation of data storage activities
CN104063486A (en) * 2014-07-03 2014-09-24 四川中亚联邦科技有限公司 Big data distributed storage method and system
CN104536814A (en) * 2015-01-16 2015-04-22 北京京东尚科信息技术有限公司 Method and system for processing workflow
CN104657497A (en) * 2015-03-09 2015-05-27 国家电网公司 Mass electricity information concurrent computation system and method based on distributed computation
CN105468756A (en) * 2015-11-30 2016-04-06 浪潮集团有限公司 Design and realization method for mass data processing system
CN105608077A (en) * 2014-10-27 2016-05-25 青岛金讯网络工程有限公司 Big data distributed storage method and system
CN106155791A (en) * 2016-06-30 2016-11-23 电子科技大学 A kind of workflow task dispatching method under distributed environment
CN106462605A (en) * 2014-05-13 2017-02-22 云聚公司 Distributed secure data storage and transmission of streaming media content

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6415297B1 (en) * 1998-11-17 2002-07-02 International Business Machines Corporation Parallel database support for workflow management systems
US20090125553A1 (en) * 2007-11-14 2009-05-14 Microsoft Corporation Asynchronous processing and function shipping in ssis
CN106339415B (en) * 2016-08-12 2019-08-23 北京奇虎科技有限公司 Querying method, the apparatus and system of data

Patent Citations (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1886739A (en) * 2003-10-27 2006-12-27 特博数据实验室公司 Distributed memory type information processing system
CN101089819A (en) * 2006-06-13 2007-12-19 国际商业机器公司 Method for dynamic stationary flow processing system and upstream processing node
US8150889B1 (en) * 2008-08-28 2012-04-03 Amazon Technologies, Inc. Parallel processing framework
CN101741885A (en) * 2008-11-19 2010-06-16 珠海市西山居软件有限公司 Distributed system and method for processing task flow thereof
US20110313934A1 (en) * 2010-06-21 2011-12-22 Craig Ronald Van Roy System and Method for Configuring Workflow Templates
CN102456185A (en) * 2010-10-29 2012-05-16 金蝶软件(中国)有限公司 Distributed workflow processing method and distributed workflow engine system
CN102467411A (en) * 2010-11-19 2012-05-23 金蝶软件(中国)有限公司 Workflow processing and workflow agent method, device and system
CN102542367A (en) * 2010-12-10 2012-07-04 金蝶软件(中国)有限公司 Cloud computing network workflow processing method, device and system based on domain model
US20130086116A1 (en) * 2011-10-04 2013-04-04 International Business Machines Corporation Declarative specification of data integraton workflows for execution on parallel processing platforms
CN103309867A (en) * 2012-03-09 2013-09-18 句容智恒安全设备有限公司 Web data mining system on basis of Hadoop platform
US20130253977A1 (en) * 2012-03-23 2013-09-26 Commvault Systems, Inc. Automation of data storage activities
CN106462605A (en) * 2014-05-13 2017-02-22 云聚公司 Distributed secure data storage and transmission of streaming media content
CN104063486A (en) * 2014-07-03 2014-09-24 四川中亚联邦科技有限公司 Big data distributed storage method and system
CN105608077A (en) * 2014-10-27 2016-05-25 青岛金讯网络工程有限公司 Big data distributed storage method and system
CN104536814A (en) * 2015-01-16 2015-04-22 北京京东尚科信息技术有限公司 Method and system for processing workflow
CN104657497A (en) * 2015-03-09 2015-05-27 国家电网公司 Mass electricity information concurrent computation system and method based on distributed computation
CN105468756A (en) * 2015-11-30 2016-04-06 浪潮集团有限公司 Design and realization method for mass data processing system
CN106155791A (en) * 2016-06-30 2016-11-23 电子科技大学 A kind of workflow task dispatching method under distributed environment

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111435938A (en) * 2019-01-14 2020-07-21 阿里巴巴集团控股有限公司 Data request processing method, device and equipment
CN110046131A (en) * 2019-01-23 2019-07-23 阿里巴巴集团控股有限公司 The Stream Processing method, apparatus and distributed file system HDFS of data
CN111290744A (en) * 2020-01-22 2020-06-16 北京百度网讯科技有限公司 Stream computing job processing method, stream computing system and electronic device

Also Published As

Publication number Publication date
WO2018188607A1 (en) 2018-10-18
CN108696559B (en) 2021-08-20

Similar Documents

Publication Publication Date Title
JP6732798B2 (en) Automatic scaling of resource instance groups in a compute cluster
CN109416643B (en) Application program migration system
CN109643312B (en) Hosted query service
JP6974218B2 (en) Storage system and its operation method
CN109074377B (en) Managed function execution for real-time processing of data streams
CN109328335B (en) Intelligent configuration discovery techniques
JPWO2016121754A1 (en) System, virtualization control device, control method and program for virtualization control device
CN107015989A (en) Data processing method and device
US20120246157A1 (en) Method and system for dynamically tagging metrics data
CN108696559A (en) Method for stream processing and device
CN112448833A (en) Multi-management-domain communication method and device
KR101378348B1 (en) Basic prototype of hadoop cluster based on private cloud infrastructure
US10706073B1 (en) Partitioned batch processing for a usage analysis system
CN112199426B (en) Interface call management method, device, server and medium under micro-service architecture
US10348596B1 (en) Data integrity monitoring for a usage analysis system
CN109257256A (en) Apparatus monitoring method, device, computer equipment and storage medium
TW202315360A (en) Microservice allocation method, electronic equipment, and storage medium
CN114116908A (en) Data management method and device and electronic equipment
CN107347024A (en) A kind of method and apparatus for storing Operation Log
CN112631996A (en) Log searching method and device
US10606714B2 (en) Stopping central processing units for data collection based on event categories of events
Marian et al. Analysis of Different SaaS Architectures from a Trust Service Provider Perspective
CN111353766A (en) Service process processing system and method of distributed service system
JP2015064740A (en) Virtual machine provision system
CN114553492B (en) Cloud platform-based operation request processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20220216

Address after: 550025 Huawei cloud data center, jiaoxinggong Road, Qianzhong Avenue, Gui'an New District, Guiyang City, Guizhou Province

Patentee after: Huawei Cloud Computing Technology Co.,Ltd.

Address before: 518129 Bantian HUAWEI headquarters office building, Longgang District, Guangdong, Shenzhen

Patentee before: HUAWEI TECHNOLOGIES Co.,Ltd.

TR01 Transfer of patent right