CN108696559A - Method for stream processing and device - Google Patents
Method for stream processing and device Download PDFInfo
- Publication number
- CN108696559A CN108696559A CN201710233425.0A CN201710233425A CN108696559A CN 108696559 A CN108696559 A CN 108696559A CN 201710233425 A CN201710233425 A CN 201710233425A CN 108696559 A CN108696559 A CN 108696559A
- Authority
- CN
- China
- Prior art keywords
- stream process
- piece
- stream
- block number
- data memory
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 329
- 238000012545 processing Methods 0.000 title claims abstract description 49
- 230000008569 process Effects 0.000 claims abstract description 291
- 238000007726 management method Methods 0.000 claims description 57
- 238000013523 data management Methods 0.000 claims description 22
- 230000006870 function Effects 0.000 claims description 16
- 230000005540 biological transmission Effects 0.000 claims description 4
- 238000013500 data storage Methods 0.000 claims 1
- 238000012546 transfer Methods 0.000 abstract description 7
- 238000010586 diagram Methods 0.000 description 7
- 238000009826 distribution Methods 0.000 description 5
- 238000013459 approach Methods 0.000 description 4
- 238000004891 communication Methods 0.000 description 3
- 238000013461 design Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000011112 process operation Methods 0.000 description 3
- 238000003860 storage Methods 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 229940126655 NDI-034858 Drugs 0.000 description 1
- 241000290929 Nimbus Species 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000002547 anomalous effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 239000003795 chemical substances by application Substances 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 239000012634 fragment Substances 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 230000009897 systematic effect Effects 0.000 description 1
- 238000013396 workstream Methods 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/50—Network services
- H04L67/56—Provisioning of proxy services
- H04L67/561—Adding application-functional data or data for application control, e.g. adding metadata
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/1097—Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/50—Network services
- H04L67/60—Scheduling or organising the servicing of application requests, e.g. requests for application data transmissions using the analysis and optimisation of the required network resources
Abstract
The embodiment of the invention discloses a kind of method for stream processing and device, this method includes:Stream process administrative unit receives the stream process task that client is sent;Stream process administrative unit obtains the network address of corresponding each piece of the block number in path of pending file and the data memory node at each piece of place from metadata management node;Stream process logic and each piece of block number are sent to the stream processing unit of the data memory node where each piece by stream process administrative unit respectively;Stream process computing unit obtains the corresponding block number evidence of block number received from the data memory node at place, for the corresponding block number of block number received according to execution stream process logic.In the above manner, can overcome because the network transfer speeds between stream processing system and data memory node are relatively low influence the speed of stream process the technical issues of.
Description
Technical field
The present invention relates to information technology field, more particularly to a kind of method for stream processing and device.
Background technology
Workflow (Work flow) how front and back is organized together between each business in workflow and workflow
Logic rules it is abstract, summarize, description.The stream concept that works originates from organization of production and Field of Office Automation, is to be directed to day
Often with fixed routine activity and a concept proposing in work, it is therefore an objective to by by job analysis at defining good flow
Or role, it executes these flows according to certain rule and process and it is monitored, reach and improve working efficiency, more preferable
Control process, enhancing to the servicing of client, effective management business flow the purpose of.Workflow modeling is being counted workflow
It is indicated with appropriate model in calculation machine and to the calculation that plays tricks in fact.By workflow modeling, workflow can pass through workflow
System manages.
The major function of stream processing system is to go to definition, execution and management work stream, association by the support of computer technology
Information exchange during tune workflow execution between flow and between group member.Stream processing system is usually by workflow
Design tool, Work Process Management tool composition, workflow design tool design the definition of work flow of oneself, work for user
Make workflow management tool to be in charge of the management the execution of flow.In the Workflow system course of work, workflow example includes
One or more business (Task), each business agent need a certain work carried out.
Apache Storm are typical stream processing systems in the prior art, by Master-Slave (MS master-slave) framework group
At Nimbus is host process, and Supervisor is the slave process of operation business.Stream processing system Storm and distributed field system
Network connection is stood in construction in a systematic way, and distributed file system storage needs the data that stream processing system Storm is handled, distributed document
System includes Master Server (master server) and Data Server (data server), and Master Server are first numbers
According to management node, the distribution situation of management data block, Data Server are data memory node points, store data block data,
Storm is arranged with data memory node point on different server.
In the stream process operation of Storm, Storm needs the number of progress stream process firstly the need of being obtained from data server
According to.Specifically, data server provides data-query interfaces, and Storm is passed through by network inputs parameter to data-query interfaces
Network obtains data from data server, and then the data got are loaded into Supervisor.
Since in the prior art, stream processing system needs to obtain data from data memory node by network, therefore obtains
The speed of access evidence is limited by network performance, the limited performance of entire stream process can be caused in network, in stream processing system and number
When relatively low according to the network transfer speeds between memory node, the speed of stream process can be greatly influenced.
Invention content
To solve problem of the prior art, a kind of method for stream processing of offer of the embodiment of the present invention and device, can overcome because of stream
Network transfer speeds between processing system and data memory node are relatively low and the technical issues of influencing the speed of stream process.
In a first aspect, the embodiment of the present invention provides a kind of method for stream processing, this method is applied to stream processing system, stream process
System includes stream process administrative unit and stream process computing unit, this method include:
Stream process administrative unit receives the stream process task that client is sent, and wherein stream process task includes stream process logic
With pending file in the path of distributed file system, distributed file system includes metadata management node and multiple data
Memory node, each data memory node are provided with stream process computing unit;
Stream process administrative unit obtains corresponding each piece of the block number in path of pending file from metadata management node,
And the network address of the data memory node where each piece;
Data where stream process logic and each piece of block number are sent to each piece by stream process administrative unit respectively are deposited
Store up the stream processing unit of node;
Stream process computing unit obtains the corresponding block number evidence of block number that receives from the data memory node at place, for connecing
The corresponding block number of block number received is according to execution stream process logic.
Since the distribution of stream process computing unit is arranged on each data memory node the embodiment of the present invention, and by stream process
Stream process task is sent to corresponding data memory node by administrative unit according to the path of pending file, by corresponding data
Stream process computing unit on memory node is directly locally reading the corresponding block number evidence of pending file, and the block that will be read
Data run stream process logic can overcome since stream process computing unit locally reads pending file because of stream process system
Unite it is relatively low the network transfer speeds between data memory node and the technical issues of influence the speed of stream process.
Also, since pending file is broken up into block number evidence, executed parallel in different stream process computing units respectively
Stream process logic, therefore stream process speed can be further speeded up, improve treatment effeciency.
In an implementation of the embodiment of the present invention, data memory node is provided with Data Management Unit, stream process
Computing unit is set as program library, and Data Management Unit executes the function of stream process computing unit by loading procedure library.
Due to by stream process computing unit by program lab setting in Data Management Unit, and Data Management Unit can be straight
It connects and reads block number evidence, block number can be read after in Data Management Unit, you can execute stream process logic, stream process speed can be accelerated
Degree.
In another realization method of the embodiment of the present invention, this method further includes:
Stream process computing unit is sent to stream process administrative unit by the handling result that stream process logic obtains is executed.
In another realization method of the embodiment of the present invention, metadata management nodes records file to be handled is being distributed
First correspondence in the path of formula file system and each piece of block number, stream process administrative unit are obtained from metadata management node
The network address of the data memory node where corresponding each piece of the block number in path and each piece of block number is taken specifically to wrap
It includes:
Stream process administrative unit obtains in the path of distributed file system from the first correspondence according to pending file
Each piece of block number.
In another realization method of the embodiment of the present invention, metadata management nodes records have each piece of block number with it is each
Second correspondence of the network address of the data memory node where the block number of a block, stream process administrative unit is from metadata pipe
Manage the network address of the data memory node where corresponding each piece of the block number of node acquisition approach and each piece of block number
It specifically includes:
Stream process administrative unit obtains the data memory node at each piece number place according to each piece number from the second correspondence
Network address.
Second aspect, the embodiment of the present invention provide a kind of stream processing system, including stream process administrative unit and stream process
Computing unit,
Stream process administrative unit, the stream process task for receiving client transmission, wherein stream process task includes at stream
Logic and pending file are managed in the path of distributed file system, distributed file system includes metadata management node and more
A data memory node, each data memory node are provided with stream process computing unit;
Stream process administrative unit is additionally operable to from metadata management node acquisition approach corresponding each piece number and each piece number
The network address of the data memory node at place;
Stream process administrative unit is additionally operable to respectively be sent to stream process logic and corresponding with each network address piece number pair
The stream processing unit for the data memory node answered;
Stream process computing unit, for obtaining the corresponding block number evidence of block number received from the data memory node at place,
For the corresponding block number of block number received according to execution stream process logic.
In an implementation of the embodiment of the present invention, data memory node is provided with Data Management Unit, stream process
Computing unit is set as program library, and Data Management Unit executes the function of stream process computing unit by loading procedure library.
In another realization method of the embodiment of the present invention, stream process computing unit, stream process will be executed by, which being additionally operable to, patrols
It collects the handling result obtained and is sent to stream process administrative unit.
In another realization method of the embodiment of the present invention, metadata management nodes records file to be handled is being distributed
First correspondence in the path of formula file system and each piece of block number, stream process administrative unit are specifically used for:
Each piece of block number is obtained in the path of distributed file system from the first correspondence according to pending file.
In another realization method of the embodiment of the present invention, metadata management nodes records have each piece of block number with it is each
Second correspondence of the network address of the data memory node where the block number of a block, stream process administrative unit are specifically used for:
The network address of the data memory node at each piece number place is obtained from the second correspondence according to each piece of block number.
The third aspect, the embodiment of the present invention provide a kind of stream process administrative unit, execute the stream in above-mentioned stream processing system
Handle the function of administrative unit.
Fourth aspect, the embodiment of the present invention provide a kind of host, including memory, processor and bus, memory, processing
Device is connect with bus, and memory has program stored therein instruction, and processor is executed program instructions to realize in above-mentioned stream processing system
The function of stream process administrative unit.
Description of the drawings
In order to illustrate the technical solution of the embodiments of the present invention more clearly, will make below to required in the embodiment of the present invention
Attached drawing is briefly described, it should be apparent that, drawings described below is only some embodiments of the present invention, for
For those of ordinary skill in the art, without creative efforts, other are can also be obtained according to these attached drawings
Attached drawing.
Fig. 1 is the structural schematic diagram of stream processing system according to the ... of the embodiment of the present invention;
Fig. 2 is another structural schematic diagram of stream processing system according to the ... of the embodiment of the present invention;
Fig. 3 is the data interaction figure of method for stream processing according to the ... of the embodiment of the present invention;
Fig. 4 is the apparatus structure schematic diagram of stream processing system according to the ... of the embodiment of the present invention;
Fig. 5 is the apparatus structure schematic diagram of host according to the ... of the embodiment of the present invention.
Specific implementation mode
It is stream processing system according to the ... of the embodiment of the present invention and distributed file system and client referring firstly to Fig. 1, Fig. 1
The connection diagram at end, as shown in Figure 1, stream processing system, which includes stream process administrative unit 302 and multiple stream process, calculates list
Member 1011,1021 ... and 1031, distributed file system includes metadata management node 201 and multiple data memory nodes
... and 103 101,102.
In embodiments of the present invention, client 301 is connect with stream process administrative unit 302, and stream process administrative unit 302 is divided
Not with metadata management node 201 and multiple data memory nodes 101,102 ..., 103 connect.
Client 301 is used to receive the stream process operation of user's submission, and in embodiments of the present invention, user submits stream process
Pending data is indicated when operation in the path of distributed file system, and provides which kind of processing is carried out to pending data.
The pending file of stream process task can be for example URL (Universal in the path of distributed file system
The abbreviation of Resource Locator, uniform resource locator), URL is that the storage of distributed file system identifies, and passes through URL
Corresponding each piece of the block number of pending file can be found in metadata management node 201.
Client 301 generates stream process task according to the stream process operation that user submits, and stream process task includes stream process
In the path of distributed file system, what wherein stream process logical definition carries out to pending data for logic and pending data
Kind processing, for example, stream process logic could dictate that searches for anomalous event in pending data.
Client 301 to stream process administrative unit 302 send stream process task, stream process administrative unit 302 according to stream at
Reason task is scheduled, and selects stream process computing unit to obtain pending file from distributed file system, and patrolled with stream process
It collects and pending file is handled.
For example, stream processing system can be based on the realization of apache flink frameworks, and client 301 is apache flink
Client (client) process, stream process administrative unit 302 be apache flink job manager (work manager)
Process, stream process computing unit are task manager (job manager) process of apache flink.
Metadata management node 201 is provided with metadata management unit 2011 and database 2012, metadata management unit
2011 provide interface, and external equipment can pass through interface polls database 2012.Database 2012 has recorded distributed file system
In pending file in the first correspondence of the path of distributed file system and each piece of block number and each piece
Second correspondence of block number and the network address of the data memory node where each piece.
In distributed memory system, pending file is stored in form of chips in the database of data memory node,
Wherein fragment refers to different block number evidences, and each block number is according to one piece number corresponding, metadata management nodes records distributed memory system
Which number correspondence between the path of middle All Files and each piece of block number and each piece number corresponding are respectively stored in
According to the database of memory node.
Data memory node 101 is provided with stream process computing unit 1011 and database 1012, and the record of database 1012 has
Block number is according to this and the correspondence between block number and block number evidence, 1011 accessible database 1012 of stream process computing unit pass through
Block number obtains corresponding block number evidence from database 1012.
In Fig. 1, data memory node 102 and 103 has similar structures with data memory node 101, and difference lies in certainly
The block number of body data-base recording is not repeated according to differing in this.
For example, distributed file system can be realized by Hadoop, database 2012, database 1012, database
1022 ... and database 1032 can be realized by Hbase (Hadoop Database, Hadoop database), metadata pipe
Manage the hmaster processes that unit 2011 can be Hbase databases.
In embodiments of the present invention, client 301 and stream process administrative unit 302 may be provided on same host, and lead to
It crosses network and metadata management node 201 and data memory node 101,102 ... 103 establish data connection respectively.
In some instances, client 301 and stream process administrative unit 302 may also be arranged on different hosts, the present invention
Embodiment is not construed as limiting this.
In order to make it easy to understand, reference can be made to another structure that Fig. 2, Fig. 2 are stream processing systems according to the ... of the embodiment of the present invention is shown
It is intended to, as shown in Fig. 2, client 301 and the setting of stream process administrative unit 302 are on host 10, host 10 further includes operation system
System 303 and hardware 304, hardware 304 are used to carry the operation of operating system 303, and hardware 304 includes physical network card 3041, client
End 301 and stream process administrative unit 302 are operated in the form of process respectively in operating system 303, and pass through physical network card
3041 access network 50.
Also, metadata management node 201 includes database 2012, metadata management unit 2011, operating system 2013
With hardware 2014, database 2012 and metadata management unit 2011 operate in operating system 2013 in the form of process respectively
On, hardware 2014 is used to carry the operation of operating system 2013, and hardware 2014 includes physical network card 20141, physical network card 20141
Network 50 is accessed, metadata management unit 2011 provides interface, and external equipment can access database 2012 by interface.
Also, data memory node 101 include data 1012, stream process computing unit 1011, operating system 1013 and
Hardware 1014, database 1012 and stream process computing unit 1011 are operated in the form of process respectively in operating system 1013,
Hardware 1014 is used to carry the operation of operating system 2013, and hardware 1014 includes physical network card 10141, and physical network card 10141 connects
Enter network 50, in embodiments of the present invention, 1011 accessible database 1012 of stream process computing unit.
The structure of data memory node 102 and 103 is similar with data memory node 101, is not repeated in this.
For example, stream process administrative unit 302 and client 301, metadata management unit 2011 and each stream process meter
Calculate unit 101,1021 ... and between 1031 can by RPC (Remote Procedure Call Protocol, remotely
Invocation of procedure agreement) realize communication.
Based on the above framework, a kind of method for stream processing is provided in the embodiment of the present invention, stream process administrative unit 302 receives visitor
The stream process task that family end 301 is sent, wherein stream process task includes stream process logic and pending file in distributed document
The path of system;Stream process administrative unit 302 from corresponding each piece of the block number of 201 acquisition approach of metadata management node, with
And the network address of the data memory node where each piece of block number;Stream process administrative unit 302 is respectively by stream process logic
Corresponding with each network address piece number stream processing unit for being sent to corresponding data memory node;Stream process computing unit from
The data memory node at place obtains the corresponding block number evidence of block number received, is held for the corresponding block number evidence of block number received
Row stream process logic.
Since the distribution of stream process computing unit is arranged on each data memory node the embodiment of the present invention, and by stream process
Stream process task is sent to corresponding data memory node by administrative unit according to the path of pending file, by corresponding data
Stream process computing unit on memory node is directly locally reading the corresponding block number evidence of pending file, and the block that will be read
Data run stream process logic can overcome since stream process computing unit locally reads pending file because of stream process system
Unite it is relatively low the network transfer speeds between data memory node and the technical issues of influence the speed of stream process.
In order to further clearly illustrate, refer to Fig. 3 below, Fig. 3 is method for stream processing according to the ... of the embodiment of the present invention
Data interaction figure, as shown in figure 3, method for stream processing includes the following steps:
Step 401:Stream process administrative unit 302 receives the stream process task that client 301 is sent, wherein stream process task
Including stream process logic and pending file in the path of distributed file system.
For example, client 301 can be the client processes in apache flink systems, stream process administrative unit
302 can be the job manager processes in apache flink systems.
Step 402:Stream process administrative unit 302 sends inquiry request to metadata management node 201, wherein inquiry is asked
It asks and carries file to be handled in the path of distributed file system.
For example, inquiry request includes input parameter and inquiry instruction, and stream process administrative unit 302 is with pending file
It is input parameter in the path of distributed file system, and input parameter and control instruction is sent to metadata management node
In the interface for accessing database 2012 that 201 metadata management unit 2011 provides.
Step 403:Metadata management node 201 is according to corresponding each piece of the block number of inquiry request return path and respectively
The network address of the corresponding data memory node of a block is to stream process administrative unit 302.
It holds it is found that the database 2012 of metadata management node 201 records file to be handled in distributed field system
The path of system and the first correspondence of each piece of block number and each piece of block number and the data memory node where each piece
Network address the second correspondence, therefore, the stream process administrative unit 302 of metadata management node 201 is according to pending
File obtains each piece of block number in the path of distributed file system from the first correspondence, and according to each piece of block number from
Second correspondence obtains the network address of the data memory node where each piece.
Assuming that the block number that stream process administrative unit 302 obtains is respectively block number 1 and block number 2, it is notable that in reality
In, block number includes multiple, is brief description in this, is only illustrated by taking two blocks number as an example, stream process administrative unit 302
The network address A that data memory node 101 is inquired according to block number 1, the net of data memory node 102 is inquired according to block number 2
Network address B.
Step 404:Stream process logic and block number 1 are sent to stream process computing unit 1011 by stream process administrative unit 302.
In this step, stream process administrative unit 302 inquires the network address A of data memory node 101 according to block number 1
Later, by corresponding piece number 1 stream process computing unit for being sent to data memory node 101 of stream process task and network address A
1011。
Step 405:Stream process logic and block number 2 are sent to stream process computing unit 1021 by stream process administrative unit 302.
In this step, stream process administrative unit 302 inquires the network address B of data memory node 102 according to block number 2
Later, by corresponding piece number 2 stream process computing unit for being sent to data memory node 102 of stream process task and network address B
1021。
In step 404 and 405, stream process computing unit 1011 may be, for example, one in apache flink systems
Task manager processes, stream process computing unit 1021 may be, for example, another task in apache flink systems
Manager processes.
Step 406:Stream process computing unit 1011 obtains 1 pair of the block number received from the data memory node 101 at place
The block number evidence answered, for the 1 corresponding block number of block number received according to execution stream process logic.
In this step, stream process computing unit 1011 is obtained from the database 1012 of the data memory node 101 at place
The 1 corresponding block number evidence of block number received from stream process administrative unit 302, and for 1 corresponding block number of block number according to execute stream at
Manage logic.
In some instances, data memory node 101 is further provided with Data Management Unit, and Data Management Unit is used for
Database 1012 is accessed to manage the block number evidence in database 1012.
For example, distributed file system can be Hadoop, the database of Hadoop by Hbase database realizings,
Metadata management unit 2011 is the Hmaster processes of Hbase databases, and stream process computing unit is set as program library, data
Administrative unit executes the function of stream process computing unit by loading procedure library.
Further, Data Management Unit is, for example, the HReigonServer processes of Hbase databases, HReigonServer
Task manager processes are embedded into HReigonServer processes by process, and task manager processes may be configured as format
It is the program library of jar packets or so files, and startup interface is provided, HReigonServer processes passes through after loading procedure library
Operation starts the function that task manager processes can be realized in interface.
Due in embodiments of the present invention, realizing that the HReigonServer processes of the function of task manager processes can
In the block number evidence of local reading database 1012, therefore the process for obtaining block number evidence can avoid the shadow by external network performance
It rings, and since HReigonServer processes directly access the database 1012 in process, i.e., directly reads block number evidence from memory,
Therefore the speed for obtaining block number evidence faster, can effectively improve the efficiency of stream process.
In other examples, Data Management Unit can operate in operating system simultaneously with stream process computing unit 1011
1013, the interface access data library 1012 that stream process computing unit 1011 is provided by Data Management Unit, in those examples,
Although not being to directly access the database 1012 in process by HReigonServer processes, stream process computing unit 1011
The influence of external network performance can be also can avoid in local IP access database 1012.
Step 407:Stream process computing unit 1021 obtains 2 pairs of the block number received from the data memory node 102 at place
The block number evidence answered, for the 2 corresponding block number of block number received according to execution stream process logic.
Similar with previous step, in some instances, data memory node 102 is provided with Data Management Unit, data pipe
Reason unit is for accessing database 1022 with management block data.Distributed file system can be Hadoop, the database of Hadoop
By Hbase database realizings, metadata management unit 2011 is the Hmaster processes of Hbase databases, and stream process calculates single
Member 1011 is set as program library, and Data Management Unit executes the function of stream process computing unit by loading procedure library.
Further, Data Management Unit is, for example, the HReigonServer processes of Hbase databases, HReigonServer
Task manager processes are embedded into HReigonServer processes by process, and task manager processes may be configured as format
It is the program library of jar packets or so files, and startup interface is provided, HReigonServer processes passes through after loading procedure library
Operation starts the function that task manager processes can be realized in interface.
Due in embodiments of the present invention, realizing that the HReigonServer processes of the function of task manager processes can
In the block number evidence of local reading database 1022, therefore the process for obtaining block number evidence can avoid the shadow by external network performance
It rings, and since HReigonServer processes directly access the database 1022 in process, obtains the speed of block number evidence more
Soon, the efficiency of stream process can be effectively improved.
In other examples, Data Management Unit can operate in operating system simultaneously with stream process computing unit 1021
1023, the interface access data library 1022 that stream process computing unit 1021 is provided by Data Management Unit, in those examples,
Although not being to directly access the database 1012 in process by HReigonServer processes, stream process computing unit 1021
The influence of external network performance can be also can avoid in local IP access database 1022.
Step 408:Stream process computing unit by 1 corresponding block number of block number according to execute stream process logic obtain first at
Reason result is sent to stream process administrative unit 302.
Step 409:Stream process computing unit by 2 corresponding block number of block number according to execute stream process logic obtain second at
Reason result is sent to stream process administrative unit 302.
To sum up, the distribution of stream process computing unit is arranged on each data memory node due to the embodiment of the present invention, and by
Stream process task is sent to corresponding data memory node by stream process administrative unit according to the path of pending file, by corresponding to
Data memory node on stream process computing unit directly locally reading the corresponding block number evidence of pending file, and will read
The block data run stream process logic arrived can overcome since stream process computing unit locally reads pending file because of stream
Network transfer speeds between processing system and data memory node are relatively low and the technical issues of influencing the speed of stream process.
Also, since pending file is broken up into block number evidence, executed parallel in different stream process computing units respectively
Stream process logic, therefore stream process speed can be further speeded up, improve treatment effeciency.
It is worth noting that, in the alternative embodiment of the present invention, stream processing system 90 is also based on Storm, Spark
Or Samza frameworks are realized.
Fig. 4 is referred to below, Fig. 4 is the apparatus structure schematic diagram of stream process administrative unit according to the ... of the embodiment of the present invention,
As shown in figure 4, stream process administrative unit 302 includes:
Receiving module 601, the stream process task for receiving client transmission, wherein stream process task includes that stream process is patrolled
It collects and pending file is in the path of distributed file system, distributed file system includes metadata management node and multiple numbers
According to memory node, each data memory node is provided with stream process computing unit;
Enquiry module 602, for from corresponding each piece of the block number of metadata management node acquisition approach and each piece
The network address of the data memory node at place;
Sending module 603, the data for being respectively sent to stream process logic and each piece of block number where each piece
The stream processing unit of memory node.
Optionally, receiving unit 601 is additionally operable to receive what the execution stream process logic that stream process computing unit is sent obtained
Handling result.
Optionally, metadata management nodes records file to be handled is the path of distributed file system and each piece
First correspondence of block number, each piece of block number are second corresponding with the network address of the data memory node where each piece
Relationship, enquiry module 602 are specifically used for:
Each piece of block number is obtained in the path of distributed file system from the first correspondence according to pending file;
The network address of the data memory node at each piece of place is obtained from the second correspondence according to each piece of block number.
Fig. 5 is referred to below, Fig. 5 is the apparatus structure schematic diagram of host according to the ... of the embodiment of the present invention, as shown in figure 5,
Host 50 includes memory 502, processor 501 and bus 503, and memory 502, processor 501 are connect with bus 503, is stored
Device 502 has program stored therein instruction, and processor 501 is executed program instructions to realize that the management of the stream process in above-mentioned stream processing system is single
The function of member 302.
Since the distribution of stream process computing unit is arranged on each data memory node the embodiment of the present invention, and by stream process
Stream process task is sent to corresponding data memory node by administrative unit according to the path of pending file, by corresponding data
Stream process computing unit on memory node is directly locally reading the corresponding block number evidence of pending file, and the block that will be read
Data run stream process logic can overcome since stream process computing unit locally reads pending file because of stream process system
Unite it is relatively low the network transfer speeds between data memory node and the technical issues of influence the speed of stream process.
Also, since pending file is broken up into block number evidence, executed parallel in different stream process computing units respectively
Stream process logic, therefore stream process speed can be further speeded up, improve treatment effeciency.
It should be noted that any device embodiment described above is all only schematical, wherein described as separation
The unit of part description may or may not be physically separated, the component shown as unit can be or
It can not be physical unit, you can be located at a place, or may be distributed over multiple network units.It can be according to reality
Border needs to select some or all of process therein to achieve the purpose of the solution of this embodiment.In addition, provided by the invention
In device embodiment attached drawing, the connection relation between process indicates there is communication connection between them, specifically can be implemented as one
Item or a plurality of communication bus or signal wire.Those of ordinary skill in the art are without creative efforts, you can with
Understand and implements.
Through the above description of the embodiments, it is apparent to those skilled in the art that the present invention can borrow
Help software that the mode of required common hardware is added to realize, naturally it is also possible to by specialized hardware include application-specific integrated circuit, specially
It is realized with CPU, private memory, special components and parts etc..Under normal circumstances, all functions of being completed by computer program can
It is easily realized with corresponding hardware, moreover, for realizing that the particular hardware structure of same function can also be a variety of more
Sample, such as analog circuit, digital circuit or special circuit etc..But it is more for the purpose of the present invention in the case of software program it is real
It is now more preferably embodiment.Based on this understanding, technical scheme of the present invention substantially in other words makes the prior art
The part of contribution can be expressed in the form of software products, which is stored in the storage medium that can be read
In, such as the floppy disk of computer, USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory
Device (RAM, Random Access Memory), magnetic disc or CD etc., including some instructions are with so that a computer is set
Standby (can be personal computer, server or the network equipment etc.) executes the method described in each embodiment of the present invention.
It is apparent to those skilled in the art that the specific work of the system of foregoing description, device or unit
Make process, can refer to corresponding processes in the foregoing method embodiment, details are not described herein.
The above description is merely a specific embodiment, but scope of protection of the present invention is not limited thereto, any
Those familiar with the art in the technical scope disclosed by the present invention, can easily think of the change or the replacement, and should all contain
Lid is within protection scope of the present invention.Therefore, protection scope of the present invention should be based on the protection scope of the described claims.
Claims (12)
1. a kind of method for stream processing, which is characterized in that the method is applied to stream processing system, and the stream processing system includes stream
Administrative unit and stream process computing unit are handled, the method includes:
The stream process administrative unit receives the stream process task that client is sent, wherein the stream process task includes stream process
Logic and pending file in the path of distributed file system, the distributed file system include metadata management node and
Multiple data memory nodes, each data memory node are provided with stream process computing unit;
The path that the stream process administrative unit obtains the pending file from the metadata management node is corresponding each
The network address of the block number of block and the data memory node at each piece of place;
The stream process logic and each piece of the block number are sent to where each piece by the stream process administrative unit respectively
Data memory node stream processing unit;
The stream process computing unit obtains the described piece number corresponding block number evidence received, needle from the data memory node at place
To the corresponding block number of described piece received number according to the execution stream process logic.
2. according to the method described in claim 1, it is characterized in that, the data memory node is provided with Data Management Unit,
The stream process computing unit is set as program library, and the Data Management Unit is executed by loading described program library at the stream
Manage the function of computing unit.
3. method according to claim 1 or 2, which is characterized in that the method further includes:
The stream process computing unit is sent to the stream process management by the handling result that the stream process logic obtains is executed
Unit.
4. method according to any one of claims 1 to 3, which is characterized in that the metadata management nodes records are
Pending file is stated in first correspondence in the path of the distributed file system and each piece of block number, the stream process
Administrative unit obtains corresponding each piece of the block number in the path and the number at each piece of place from the metadata management node
It is specifically included according to the network address of memory node:
The stream process administrative unit is according to the pending file in the path of the distributed file system from described first
Correspondence obtains each piece of block number.
5. according to the method described in claim 4, it is characterized in that, the metadata management nodes records have each piece of block number
With the second correspondence of the network address of the data memory node where each piece, the stream process administrative unit is from the member
Data management node obtains the data storage where the block number of corresponding each piece of the block number in the path and each a block
The network address of node specifically includes:
The stream process administrative unit obtains the data where each piece according to each piece of block number from second correspondence
The network address of memory node.
6. a kind of stream processing system, which is characterized in that including stream process administrative unit and stream process computing unit,
The stream process administrative unit, the stream process task for receiving client transmission, wherein the stream process task includes
In the path of distributed file system, the distributed file system includes metadata management for stream process logic and pending file
Node and multiple data memory nodes, each data memory node are provided with stream process computing unit;
The stream process administrative unit is additionally operable to obtain corresponding each piece of the block in the path from the metadata management node
Number and each piece where data memory node network address;
The stream process administrative unit is additionally operable to respectively be sent to the stream process logic and each piece of the block number each
The stream processing unit of data memory node where block;
The stream process computing unit, for obtaining the described piece number corresponding block number received from the data memory node at place
According to for the corresponding block number of described piece received number according to the execution stream process logic.
7. system according to claim 6, which is characterized in that the data memory node is provided with Data Management Unit,
The stream process computing unit is set as program library, and the Data Management Unit is executed by loading described program library at the stream
Manage the function of computing unit.
8. system according to claim 6, which is characterized in that
The stream process computing unit, the handling result that the stream process logic obtains will be executed by, which being additionally operable to, is sent at the stream
Manage administrative unit.
9. system according to claim 6, which is characterized in that the metadata management nodes records have the pending text
Part is in first correspondence in the path of the distributed file system and each piece of block number, the stream process administrative unit tool
Body is used for:
According to the pending file each piece is obtained from first correspondence in the path of the distributed file system
Block number.
10. system according to claim 9, which is characterized in that the metadata management nodes records have each described piece number
With the second correspondence of the network address of the data memory node where each described piece number, the stream process administrative unit is specific
For:
The network address of the data memory node at each piece of place is obtained from second correspondence according to each piece of block number.
11. a kind of stream process administrative unit, which is characterized in that including:
Receiving module, the stream process task for receiving client transmission, wherein the stream process task includes stream process logic
With pending file in the path of distributed file system, the distributed file system includes metadata management node and multiple
Data memory node, each data memory node are provided with stream process computing unit;
Enquiry module, for obtaining corresponding each piece of the block number in the path and each from the metadata management node
The network address of data memory node where block;
Sending module, the data for being respectively sent to the stream process logic and each piece of the block number where each piece
The stream processing unit of memory node.
12. a kind of host, which is characterized in that including memory, processor and bus, the memory, the processor and institute
Bus connection is stated, the memory has program stored therein instruction, and the processor executes described program and instructs so that the host
Execute following step:
The stream process task that client is sent is received, is existed wherein the stream process task includes stream process logic and pending file
The path of distributed file system, the distributed file system include metadata management node and multiple data memory nodes,
Each data memory node is provided with stream process computing unit;
The data that corresponding each piece of the block number in the path and each piece of place are obtained from the metadata management node are deposited
Store up the network address of node;
The stream process logic and each piece of the block number are sent to the stream of the data memory node where each piece respectively
Processing unit.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710233425.0A CN108696559B (en) | 2017-04-11 | 2017-04-11 | Stream processing method and device |
PCT/CN2018/082641 WO2018188607A1 (en) | 2017-04-11 | 2018-04-11 | Stream processing method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710233425.0A CN108696559B (en) | 2017-04-11 | 2017-04-11 | Stream processing method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108696559A true CN108696559A (en) | 2018-10-23 |
CN108696559B CN108696559B (en) | 2021-08-20 |
Family
ID=63792265
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710233425.0A Active CN108696559B (en) | 2017-04-11 | 2017-04-11 | Stream processing method and device |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN108696559B (en) |
WO (1) | WO2018188607A1 (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110046131A (en) * | 2019-01-23 | 2019-07-23 | 阿里巴巴集团控股有限公司 | The Stream Processing method, apparatus and distributed file system HDFS of data |
CN111290744A (en) * | 2020-01-22 | 2020-06-16 | 北京百度网讯科技有限公司 | Stream computing job processing method, stream computing system and electronic device |
CN111435938A (en) * | 2019-01-14 | 2020-07-21 | 阿里巴巴集团控股有限公司 | Data request processing method, device and equipment |
Citations (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1886739A (en) * | 2003-10-27 | 2006-12-27 | 特博数据实验室公司 | Distributed memory type information processing system |
CN101089819A (en) * | 2006-06-13 | 2007-12-19 | 国际商业机器公司 | Method for dynamic stationary flow processing system and upstream processing node |
CN101741885A (en) * | 2008-11-19 | 2010-06-16 | 珠海市西山居软件有限公司 | Distributed system and method for processing task flow thereof |
US20110313934A1 (en) * | 2010-06-21 | 2011-12-22 | Craig Ronald Van Roy | System and Method for Configuring Workflow Templates |
US8150889B1 (en) * | 2008-08-28 | 2012-04-03 | Amazon Technologies, Inc. | Parallel processing framework |
CN102456185A (en) * | 2010-10-29 | 2012-05-16 | 金蝶软件(中国)有限公司 | Distributed workflow processing method and distributed workflow engine system |
CN102467411A (en) * | 2010-11-19 | 2012-05-23 | 金蝶软件(中国)有限公司 | Workflow processing and workflow agent method, device and system |
CN102542367A (en) * | 2010-12-10 | 2012-07-04 | 金蝶软件(中国)有限公司 | Cloud computing network workflow processing method, device and system based on domain model |
US20130086116A1 (en) * | 2011-10-04 | 2013-04-04 | International Business Machines Corporation | Declarative specification of data integraton workflows for execution on parallel processing platforms |
CN103309867A (en) * | 2012-03-09 | 2013-09-18 | 句容智恒安全设备有限公司 | Web data mining system on basis of Hadoop platform |
US20130253977A1 (en) * | 2012-03-23 | 2013-09-26 | Commvault Systems, Inc. | Automation of data storage activities |
CN104063486A (en) * | 2014-07-03 | 2014-09-24 | 四川中亚联邦科技有限公司 | Big data distributed storage method and system |
CN104536814A (en) * | 2015-01-16 | 2015-04-22 | 北京京东尚科信息技术有限公司 | Method and system for processing workflow |
CN104657497A (en) * | 2015-03-09 | 2015-05-27 | 国家电网公司 | Mass electricity information concurrent computation system and method based on distributed computation |
CN105468756A (en) * | 2015-11-30 | 2016-04-06 | 浪潮集团有限公司 | Design and realization method for mass data processing system |
CN105608077A (en) * | 2014-10-27 | 2016-05-25 | 青岛金讯网络工程有限公司 | Big data distributed storage method and system |
CN106155791A (en) * | 2016-06-30 | 2016-11-23 | 电子科技大学 | A kind of workflow task dispatching method under distributed environment |
CN106462605A (en) * | 2014-05-13 | 2017-02-22 | 云聚公司 | Distributed secure data storage and transmission of streaming media content |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6415297B1 (en) * | 1998-11-17 | 2002-07-02 | International Business Machines Corporation | Parallel database support for workflow management systems |
US20090125553A1 (en) * | 2007-11-14 | 2009-05-14 | Microsoft Corporation | Asynchronous processing and function shipping in ssis |
CN106339415B (en) * | 2016-08-12 | 2019-08-23 | 北京奇虎科技有限公司 | Querying method, the apparatus and system of data |
-
2017
- 2017-04-11 CN CN201710233425.0A patent/CN108696559B/en active Active
-
2018
- 2018-04-11 WO PCT/CN2018/082641 patent/WO2018188607A1/en active Application Filing
Patent Citations (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1886739A (en) * | 2003-10-27 | 2006-12-27 | 特博数据实验室公司 | Distributed memory type information processing system |
CN101089819A (en) * | 2006-06-13 | 2007-12-19 | 国际商业机器公司 | Method for dynamic stationary flow processing system and upstream processing node |
US8150889B1 (en) * | 2008-08-28 | 2012-04-03 | Amazon Technologies, Inc. | Parallel processing framework |
CN101741885A (en) * | 2008-11-19 | 2010-06-16 | 珠海市西山居软件有限公司 | Distributed system and method for processing task flow thereof |
US20110313934A1 (en) * | 2010-06-21 | 2011-12-22 | Craig Ronald Van Roy | System and Method for Configuring Workflow Templates |
CN102456185A (en) * | 2010-10-29 | 2012-05-16 | 金蝶软件(中国)有限公司 | Distributed workflow processing method and distributed workflow engine system |
CN102467411A (en) * | 2010-11-19 | 2012-05-23 | 金蝶软件(中国)有限公司 | Workflow processing and workflow agent method, device and system |
CN102542367A (en) * | 2010-12-10 | 2012-07-04 | 金蝶软件(中国)有限公司 | Cloud computing network workflow processing method, device and system based on domain model |
US20130086116A1 (en) * | 2011-10-04 | 2013-04-04 | International Business Machines Corporation | Declarative specification of data integraton workflows for execution on parallel processing platforms |
CN103309867A (en) * | 2012-03-09 | 2013-09-18 | 句容智恒安全设备有限公司 | Web data mining system on basis of Hadoop platform |
US20130253977A1 (en) * | 2012-03-23 | 2013-09-26 | Commvault Systems, Inc. | Automation of data storage activities |
CN106462605A (en) * | 2014-05-13 | 2017-02-22 | 云聚公司 | Distributed secure data storage and transmission of streaming media content |
CN104063486A (en) * | 2014-07-03 | 2014-09-24 | 四川中亚联邦科技有限公司 | Big data distributed storage method and system |
CN105608077A (en) * | 2014-10-27 | 2016-05-25 | 青岛金讯网络工程有限公司 | Big data distributed storage method and system |
CN104536814A (en) * | 2015-01-16 | 2015-04-22 | 北京京东尚科信息技术有限公司 | Method and system for processing workflow |
CN104657497A (en) * | 2015-03-09 | 2015-05-27 | 国家电网公司 | Mass electricity information concurrent computation system and method based on distributed computation |
CN105468756A (en) * | 2015-11-30 | 2016-04-06 | 浪潮集团有限公司 | Design and realization method for mass data processing system |
CN106155791A (en) * | 2016-06-30 | 2016-11-23 | 电子科技大学 | A kind of workflow task dispatching method under distributed environment |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111435938A (en) * | 2019-01-14 | 2020-07-21 | 阿里巴巴集团控股有限公司 | Data request processing method, device and equipment |
CN110046131A (en) * | 2019-01-23 | 2019-07-23 | 阿里巴巴集团控股有限公司 | The Stream Processing method, apparatus and distributed file system HDFS of data |
CN111290744A (en) * | 2020-01-22 | 2020-06-16 | 北京百度网讯科技有限公司 | Stream computing job processing method, stream computing system and electronic device |
Also Published As
Publication number | Publication date |
---|---|
WO2018188607A1 (en) | 2018-10-18 |
CN108696559B (en) | 2021-08-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP6732798B2 (en) | Automatic scaling of resource instance groups in a compute cluster | |
CN109416643B (en) | Application program migration system | |
CN109643312B (en) | Hosted query service | |
JP6974218B2 (en) | Storage system and its operation method | |
CN109074377B (en) | Managed function execution for real-time processing of data streams | |
CN109328335B (en) | Intelligent configuration discovery techniques | |
JPWO2016121754A1 (en) | System, virtualization control device, control method and program for virtualization control device | |
CN107015989A (en) | Data processing method and device | |
US20120246157A1 (en) | Method and system for dynamically tagging metrics data | |
CN108696559A (en) | Method for stream processing and device | |
CN112448833A (en) | Multi-management-domain communication method and device | |
KR101378348B1 (en) | Basic prototype of hadoop cluster based on private cloud infrastructure | |
US10706073B1 (en) | Partitioned batch processing for a usage analysis system | |
CN112199426B (en) | Interface call management method, device, server and medium under micro-service architecture | |
US10348596B1 (en) | Data integrity monitoring for a usage analysis system | |
CN109257256A (en) | Apparatus monitoring method, device, computer equipment and storage medium | |
TW202315360A (en) | Microservice allocation method, electronic equipment, and storage medium | |
CN114116908A (en) | Data management method and device and electronic equipment | |
CN107347024A (en) | A kind of method and apparatus for storing Operation Log | |
CN112631996A (en) | Log searching method and device | |
US10606714B2 (en) | Stopping central processing units for data collection based on event categories of events | |
Marian et al. | Analysis of Different SaaS Architectures from a Trust Service Provider Perspective | |
CN111353766A (en) | Service process processing system and method of distributed service system | |
JP2015064740A (en) | Virtual machine provision system | |
CN114553492B (en) | Cloud platform-based operation request processing method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right |
Effective date of registration: 20220216 Address after: 550025 Huawei cloud data center, jiaoxinggong Road, Qianzhong Avenue, Gui'an New District, Guiyang City, Guizhou Province Patentee after: Huawei Cloud Computing Technology Co.,Ltd. Address before: 518129 Bantian HUAWEI headquarters office building, Longgang District, Guangdong, Shenzhen Patentee before: HUAWEI TECHNOLOGIES Co.,Ltd. |
|
TR01 | Transfer of patent right |