CN105574082A - Storm based stream processing method and system - Google Patents

Storm based stream processing method and system Download PDF

Info

Publication number
CN105574082A
CN105574082A CN201510896623.6A CN201510896623A CN105574082A CN 105574082 A CN105574082 A CN 105574082A CN 201510896623 A CN201510896623 A CN 201510896623A CN 105574082 A CN105574082 A CN 105574082A
Authority
CN
China
Prior art keywords
data
topological
parameter
module
configuration file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510896623.6A
Other languages
Chinese (zh)
Inventor
谢莹莹
郭庆
惠润海
宋怀明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dawning Information Industry Beijing Co Ltd
Original Assignee
Dawning Information Industry Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dawning Information Industry Beijing Co Ltd filed Critical Dawning Information Industry Beijing Co Ltd
Priority to CN201510896623.6A priority Critical patent/CN105574082A/en
Publication of CN105574082A publication Critical patent/CN105574082A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24568Data stream processing; Continuous queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2216/00Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
    • G06F2216/03Data mining

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a Storm based stream processing method and system, and belongs to the technical field of data processing. The method comprises the steps of extracting data; obtaining a configuration file and analyzing a topological parameter in the configuration file, wherein the topological parameter is used for constructing a topological job; according to the topological parameter and the data, constructing the topological job; and performing Storm stream processing on the topological job. The system comprises a universal data access module used for extracting the data, a configuration file analysis module used for obtaining the configuration file and analyzing the topological parameter in the configuration file, and a Storm stream processing module used for constructing the topological job according to the topological parameter and a database and submitting the topological job to a Storm cluster for stream processing. According to the Storm based stream processing system, the data stored by a user and the configuration file provided by the user can be automatically obtained and the generated topological job is submitted to the Storm cluster for stream processing, so that general purpose modules and job configuration assembly in a whole stream processing frame can be realized and the stream processing job development process can be simplified and accelerated.

Description

Based on method for stream processing and the system of Storm
Technical field
The present invention relates to technical field of data processing, particularly relate to the method for stream processing based on Storm and system.
Background technology
Data mining by the integration to mass data, analysis, statistics, can make full use of the value contained in data, excavates the information more useful to business administration, business improvement, business opportunity seizure etc.Along with the arrival of large data age, the data volume of data mining increases thereupon.Stream treatment technology can carry out real-time analysis to large data, has the very strong time value.
The performance history calculated due to streaming is complicated, and Storm provides the interface of Spout, Bolt, Toplogy and Data Serialization, and data mining personnel can be facilitated to develop corresponding stream processing system based on Storm.
But Storm introduction is more difficult, do not want have to Storm the data, services personnel understood in depth for only wanting to use Storm process oneself service related data, the development of Storm becomes the major obstacle that it uses stream treatment technology.
Summary of the invention
For solving the problem, the embodiment of the present invention proposes a kind of method for stream processing based on Storm and system.
On the one hand, embodiments provide a kind of method for stream processing based on Storm, described method comprises:
Extract data;
Obtain configuration file, resolve the topological parameter in described configuration file, described topological parameter is for building topological operation;
According to described topological parameter, the operation of described data construct topology;
Described topological operation is carried out the process of Storm stream.
Alternatively, before described extraction data, also comprise:
Obtain the parameter of each first data source, according to each first data source of described parameter encapsulation;
Described extraction data, comprising:
Obtain the second data source of user storage data, described second data source is one in described first data source;
The second data source according to encapsulation extracts described data.
Alternatively, described second data source is Hadoop distributed file system HDFS data source.
Alternatively, after described extraction data, also comprise:
By described data stored in storage medium, with by described data encapsulation persistence.
Alternatively, described by described data stored in storage medium, comprising:
By described data stored in HBase and/or Redis.
Alternatively, described acquisition configuration file, resolve the topological parameter in described configuration file, comprising:
Obtain the configuration file that user provides, resolve the assembly default parameters in described configuration file and the privately owned parameter of operation, form topological parameter according to described assembly default parameters and the privately owned parameter of operation.
Alternatively, described according to described topological parameter, the operation of described data construct topology, comprising:
Resolve User Defined business;
According to described topological parameter, described data, described self-defined business, build topological operation.
On the other hand, embodiments provide a kind of stream processing system based on Storm, described system comprises: data access general module, configuration file parsing module and Storm flow processing module;
Described data access general module, for extracting data;
Described configuration file parsing module, for obtaining configuration file, resolves the topological parameter in described configuration file, and described topological parameter is for building topological operation;
Described Storm flows processing module, for the data construct topology operation that the topological parameter obtained according to described configuration file parsing module, described data access general module extract, described topological Hand up homework to Storm cluster is carried out stream process.
Alternatively, described data access general module, also for, obtain the parameter of each first data source, according to each first data source of described parameter encapsulation; Obtain the second data source of user storage data, described second data source is one in described first data source; The second data source according to encapsulation extracts described data.
Alternatively, described data access general module, comprising: Kafka-Spout general purpose module and HDFS-Spout general purpose module;
Described HDFS-Spout general purpose module, for when the second data source obtaining user storage data is Hadoop distributed file system HDFS data source, the HDFS data source according to encapsulation extracts described data.
Alternatively, described HDFS-Spout general purpose module, also for realizing breakpoint transmission by ZooKeeper;
And/or realize failure retransfer by ZooKeeper.
Alternatively, described system, also comprises:
Data store general module, for by described data stored in storage medium, with by described data encapsulation persistence.
Alternatively, described data store general module, comprising: HBase-Bolt general purpose module and/or Redis-Bolt general purpose module;
Described HBase-Bolt general purpose module, for by data stored in HBase;
Described Redis-Bolt general purpose module, for by data stored in Redis.
Alternatively, described configuration file parsing module, specifically for obtaining the configuration file that user provides, resolves the assembly default parameters in described configuration file and the privately owned parameter of operation, forms topological parameter according to described assembly default parameters and the privately owned parameter of operation.
Alternatively, described Storm flows processing module, comprising: topological job pack assembly, Hand up homework assembly and Storm cluster;
Described topological job pack assembly, for according to described topological parameter, described data construct topology operation;
Described Hand up homework assembly, for by the topological Hand up homework of described topological job pack component construction to Storm cluster;
Described Storm cluster, is dispensed to processing node for the topological operation submitted to by described Hand up homework assembly, carries out stream process.
Alternatively, described Storm flows processing module, also comprises: self-defined parsing Service Component;
Described self-defined parsing Service Component, for resolving User Defined business;
Described topological job pack assembly, for the self-defined business obtained according to described topological parameter, described data, described self-defined parsing Service Component, builds topological operation.
Beneficial effect is as follows:
User only needs store data and provide configuration file, what the present embodiment provided flows by data access general module, data storage general module, configuration file parsing module and Storm the stream processing system based on Storm that processing module forms, can automatically according to data and configuration file, generate topological Hand up homework to Storm cluster and carry out stream process, reduce user and use the technical difficulty of carrying out flowing process by Storm, realize the general purpose module in whole stream process framework and operation configuration assembling, simplify, accelerate stream processing operation performance history.
Accompanying drawing explanation
Below with reference to accompanying drawings specific embodiments of the invention are described, wherein:
Fig. 1 shows the first process flow diagram based on the method for stream processing of Storm provided in an embodiment of the present invention;
Fig. 2 shows the first structural representation based on the stream processing system of Storm provided in an embodiment of the present invention;
Fig. 3 shows the second of providing in the another kind of embodiment of the present invention structural representation based on the stream processing system of Storm;
Fig. 4 shows a kind of DFS-Spout general purpose module treatment scheme schematic diagram provided in the another kind of embodiment of the present invention;
Fig. 5 shows the third structural representation based on the stream processing system of Storm provided in the another kind of embodiment of the present invention;
Fig. 6 shows the 4th kind of structural representation based on the stream processing system of Storm provided in the another kind of embodiment of the present invention;
Fig. 7 shows a kind of HBase-Bolt general purpose module treatment scheme schematic diagram provided in the another kind of embodiment of the present invention;
Fig. 8 shows a kind of Redis-Bolt general purpose module treatment scheme schematic diagram provided in the another kind of embodiment of the present invention;
Fig. 9 shows the 5th kind of structural representation based on the stream processing system of Storm provided in the another kind of embodiment of the present invention;
Figure 10 shows the 6th kind of structural representation based on the stream processing system of Storm provided in the another kind of embodiment of the present invention.
Embodiment
Clearly understand to make technical scheme of the present invention and advantage, below in conjunction with accompanying drawing, exemplary embodiment of the present invention is described in more detail, obviously, described embodiment is only a part of embodiment of the present invention, instead of all embodiments is exhaustive.And when not conflicting, the embodiment in this explanation and the feature in embodiment can be combined with each other.
Storm provides Spout, Bolt, the interface of Toplogy and Data Serialization, data mining personnel can be facilitated to develop corresponding stream processing system based on Storm, but Storm introduction is more difficult, do not want have to Storm the data, services personnel understood in depth for only wanting to use Storm process oneself service related data, the development of Storm becomes the major obstacle that it uses stream treatment technology, the present invention proposes a kind of method for stream processing based on Storm, the method is applied to a kind of system, should based on the stream processing system based on Storm as described in the embodiment shown in the stream process figure as arbitrary in following Fig. 2 to Fig. 9 of Storm.This system user only needs store data and provide configuration file, automatically according to data and configuration file, can generate topological Hand up homework to Storm cluster and carries out stream process, reduce user and use the technical difficulty of carrying out flowing process by Storm.
In conjunction with above-mentioned implementation environment, embodiment shown in Figure 1, present embodiments provide a kind of querying method of this parallel memorizing of two-pack medium, the method flow that the present embodiment provides is specific as follows:
101: extract data;
102: obtain configuration file, resolve the topological parameter in configuration file, topological parameter is for building topological operation;
Alternatively, obtain configuration file, resolve the topological parameter in configuration file, comprising:
Obtain the configuration file that user provides, resolve the assembly default parameters in configuration file and the privately owned parameter of operation, form topological parameter according to assembly default parameters and the privately owned parameter of operation.
103: according to topological parameter, the operation of data construct topology;
Alternatively, according to topological parameter, the operation of data construct topology, comprising:
Resolve User Defined business;
According to topological parameter, data, self-defined business, build topological operation.
104: topological operation is carried out the process of Storm stream.
Alternatively, before extracting data, also comprise:
Obtain the parameter of each first data source, according to each first data source of parameter encapsulation;
Extract data, comprising:
Obtain the second data source of user storage data, the second data source is one in the first data source;
The second data source according to encapsulation extracts data.
Alternatively, the second data source is HDFS (HadoopDistributedFileSystem, Hadoop distributed file system) data source.
Alternatively, after extracting data, also comprise:
By data stored in storage medium, to encapsulate data persistence.
Alternatively, by data stored in storage medium, comprising:
By data stored in HBase and/or Redis.
Beneficial effect is as follows:
User only needs store data and provide configuration file, what the present embodiment provided flows by data access general module, data storage general module, configuration file parsing module and Storm the stream processing system based on Storm that processing module forms, can automatically according to data and configuration file, generate topological Hand up homework to Storm cluster and carry out stream process, reduce user and use the technical difficulty of carrying out flowing process by Storm, realize the general purpose module in whole stream process framework and operation configuration assembling, simplify, accelerate stream processing operation performance history.
In conjunction with above-mentioned implementation environment, embodiment shown in Figure 2, present embodiments provides a kind of stream processing system based on Storm:
This system comprises: data access general module 201, configuration file parsing module 202 and Storm flow processing module 203;
● data access general module 201, for extracting data;
In practical application, data access general module 201, before extracting the data, also for obtaining the parameter of each first data source, according to each first data source of described parameter encapsulation;
Such as, obtain the access parameter of each first data source, according to the access of each first data source of described parameter encapsulation.
It should be noted that, obtain the parameter of each first data source, according to the process that the process of each first data source of described parameter encapsulation not all performs at every turn, only carrying out stream process in first time by the system that the present embodiment provides is perform this process, or, perform this process when there being new data source to occur, all data sources to be encapsulated, the system that simplifying the present embodiment provides carries out data access flow process when flowing process.The present embodiment does not specifically limit the trigger condition performing this process.
Concrete, data access general module 201 obtains the second data source of user storage data, and this second data source is one in the first data source; The second data source according to encapsulation extracts data.
In addition, data can be stored in data source by user in the form of a file, such as: by all data by a certain criteria classification, uniform data are stored in identical file, are stored in by All Files in same or different pieces of information source.Data access general module 201, can obtain the second data source of user storage data; According to the second data source file reading of encapsulation, from file, extract data.
For convenience of description, if the present embodiment is without specified otherwise, all stores data instance in the form of a file with user and be described.
As shown in Figure 3, data access general module 201, comprising: Kafka-Spout general purpose module 2011 and HDFS-Spout general purpose module 2012;
Kafka-Spout general purpose module 2011 is identical with existing Kafka-Spout assembly function, is not described in detail herein.
HDFS-Spout general purpose module 2012, for when the second data source obtaining user storage data is Hadoop distributed file system HDFS data source, the HDFS data source according to encapsulation extracts data.
Concrete, in nextTuple, from document queue, take out a file, according to the compressed format of file suffixes identification file, then set up corresponding inlet flow passage, with given character string separate data record, read and record and export.
In addition, when each intermodule transmission, may there is the phenomenon of interruption in data, such as, and network interruption, or artificially interrupt.In order to avoid in have no progeny data retransmission and the wasting of resources caused, HDFS-Spout general purpose module 2012, also realizes breakpoint transmission by ZooKeeper.
, can file directory be formed for the All Files transmitted below, describe the specific implementation of breakpoint transmission in detail.
1) in Open, carry out Spout initialization, obtain from zookeeper and send the historical record of certain file directory, (namely data storage general module 202 reads the historical record of this file directory), generate a pending document queue.
2) whether have newly-increased file under checking the memory location of HDFS, this fileinfo <filePath, position, ended>, be recorded in zookeeper.
Wherein, filePath is the memory location of file, and position is the reference position of transmission, and ended is the final position of transmission.
In addition, the phenomenon of data transmission fails may be there is in data actual transmissions process, such as, be interfered in transmitting procedure.Cause in order to avoid data transmission fails data not complete, HDFS-Spout general purpose module 2012, also for when the data of extraction being sent to data and storing general module 202, realize failure retransfer by ZooKeeper.
, describe the specific implementation of failure retransfer in detail: after each file runs through below, close inlet flow, upgrade its information in zookeeper.Start a timed task, every going the time to have checked whether that new file produces, just add document queue if had, idiographic flow as shown in Figure 4 simultaneously.
HDFS-Spout general purpose module 2012, extracts the data of Text type from HDFS path, to make it to flow in the computing unit of Storm cluster.HDFS-Spout general purpose module 2012 realizes the file access under assigned catalogue, ensure that all records of file are all sent on Storm stream by the synergistic mechanism of ZooKeeper, support breakpoint transmission, the failure retransfer mechanism of supported data in whole tupletree.
In addition, in order to by data persistence, need to encapsulate data on a different storage medium, alternatively, see Fig. 5, the system that the present embodiment provides, also comprises:
● data store general module 204, for the data extracted by data access general module 201 stored in storage medium, to encapsulate data persistence;
Wherein, storage medium includes but not limited to database.
Data store general module 204, comprise: HBase-Bolt general purpose module 2041 and/or Redis-Bolt general purpose module 2042, Fig. 6 illustrates that data store general module 204, comprising: the situation of HBase-Bolt general purpose module 2041 and Redis-Bolt general purpose module 2042;
1) HBase-Bolt general purpose module 2041, for by data stored in HBase;
HBase-Bolt general purpose module 2041 provides the function writing data to HBase.Each Bolt only to tables of data write data, and detects HBase storage medium and whether occurs exception, if abnormal, the next Bolt that imports into of data security is carried out subsequent treatment.
Input data are the data that can be converted into Put, and the previous Bolt/Spout of this Bolt needs data layout data assembling being become can be converted into Put.
Concrete, the main flow writing data to HBase is divided into three parts: the assignment of parameter; The initialization of resource; Receive Tuple and write HBase.
Wherein, parameter initialization is the generation phase at Topology, and parametric variable must be fundamental type, and Storm flows processing module 203 can this argument sequence file transfer to each work node.
The initialization of HBase resource performs on each node, and namely perform prepare method, the actual treatment of Tuple is in execute method, and idiographic flow as shown in Figure 7.
2) Redis-Bolt general purpose module 2042, for by data stored in Redis.
Redis-Bolt general purpose module 2042 provides the function writing fundamental type data to Redis.The value Value Types of these data comprises: character string, hashed value, set, ordered set, list.
During concrete enforcement, include but not limited to: the Tuple form of input is (type of key, value, value uses function), and the primary function of support is:
Concrete, the main flow writing fundamental type data to Redis is divided into three parts: the assignment of parameter; The initialization of Redis resource; Receive Tuple, call corresponding function, write Redis.
Wherein, parameter initialization is the generation phase at Topology, and parametric variable must be fundamental type, and Storm platform can these argument sequence file transfer to each work node.Parameter includes but not limited to: redis.dbID, redis.master.ip, redis.master.port, redis.maxActive, redis.maxIdle, redis.maxWait, redis.testOnBorrow.
The initialization of Redis resource realizes in prepare, comprises and generates Redis connection pool according to parameter, generate client's side link object.
The main processing logic writing fundamental type data to Redis through row, first accepts tuple, obtains the handling function of record data, data type, use in Execute method; Then corresponding handling function is called by data record write redis.Idiographic flow as shown in Figure 8.
● configuration file parsing module 202, for obtaining configuration file, resolve the topological parameter in configuration file, topological parameter is for building topological operation;
Configuration file parsing module 202, specifically for obtaining the configuration file that user provides, resolves the assembly default parameters in configuration file and the privately owned parameter of operation, forms topological parameter according to assembly default parameters and the privately owned parameter of operation.
The function of configuration file parsing module 202 is resolved by user configured configuration file (such as: topology-x.xml), generates the information carrier of topology (topology), i.e. topological parameter.
The resolving of configuration file parsing module 202 is mainly divided into two steps: resolution component default parameters; Resolve the privately owned parameter of operation.
Wherein, the privately owned parameter of operation is also named topology operation peculiar parameter, can cover total parameter.
The default configuration that resolution component default parameters generates Topology-storm includes but not limited to: kafka-spout, text-hdfs-spout, avro-hdfs-spout, text-hdfs-bolt, avro-hdfs-bolt, hbase-bolt, redis-bolt, custom-bolt.Module information is only had, non-structure information in these configurations.
Need covering assemblies default parameters during the privately owned Parameter analysis of electrochemical of operation, generate and may be used for the parameter object building topological operation, i.e. topological parameter.This topological parameter uses when assembling topology.
● Storm flows processing module 203, for obtain according to configuration file parsing module 202 topological parameter, data access general module 201 extract data construct topology operation, topological Hand up homework to Storm cluster is carried out stream process.
Flow processing module 203 see Fig. 9, Storm, comprising: topological job pack assembly 2031, Hand up homework assembly 2032 and Storm cluster 2033;
Topology job pack assembly 2031, the data construct topology operation that topological parameter, data access general module 201 for obtaining according to configuration file parsing module 202 extract;
Hand up homework assembly 2032, for the topological Hand up homework that built by topological job pack assembly 2031 to Storm cluster 2033;
Storm cluster 2033, is dispensed to processing node for the topological operation submitted to by Hand up homework assembly 2032, carries out stream process.
In addition, in order to reduce the use threshold of user further, promote the versatility of system that the present embodiment provides, system, except the business that can arrange in processing configuration file, also can process User Defined business.User Defined business specifically Storm as shown in Figure 10 can flow the self-defined parsing Service Component 2034 also comprised in processing module 203 and realizes.
Wherein, self-defined parsing Service Component 2034, for resolving User Defined business;
Topology job pack assembly 2031, the self-defined business that the data that topological parameter, data access general module 201 for obtaining according to configuration file parsing module 202 extract, self-defined parsing Service Component 2034 obtain, builds topological operation.
Beneficial effect:
User only needs store data and provide configuration file, what the present embodiment provided flows by data access general module, data storage general module, configuration file parsing module and Storm the stream processing system based on Storm that processing module forms, can automatically according to data and configuration file, generate topological Hand up homework to Storm cluster and carry out stream process, reduce user and use the technical difficulty of carrying out flowing process by Storm, realize the general purpose module in whole stream process framework and operation configuration assembling, simplify, accelerate stream processing operation performance history.
In addition, the system that the present embodiment provides, in data transmission procedure, also provides breakpoint transmission and/or failure retransfer function, effectively raises data transmission quality, reduces data transmission institute's cost source.
In above-described embodiment, existing Functional Unit device blocks all can be adopted to implement.Such as, processing module can adopt existing data processing components and parts, at least, the location-server adopted just possesses realize this Functional Unit device in existing location technology; As for receiver module, be then the components and parts that equipment that any one possesses signal transfer functions all possesses; Meanwhile, what the calculating of A, n parameter, intensity adjustment etc. that processing module is carried out adopted is all existing technological means, and those skilled in the art design and develop can realize through accordingly.
For convenience of description, each several part of the above device is divided into various module or unit to describe respectively with function.Certainly, the function of each module or unit can be realized in same or multiple software or hardware when implementing of the present invention.
Those skilled in the art should understand, embodiments of the invention can be provided as method, system or computer program.Therefore, the present invention can adopt the form of complete hardware embodiment, completely software implementation or the embodiment in conjunction with software and hardware aspect.And the present invention can adopt in one or more form wherein including the upper computer program implemented of computer-usable storage medium (including but not limited to magnetic disk memory, CD-ROM, optical memory etc.) of computer usable program code.
The present invention describes with reference to according to the process flow diagram of the method for the embodiment of the present invention, equipment (system) and computer program and/or block scheme.Should understand can by the combination of the flow process in each flow process in computer program instructions realization flow figure and/or block scheme and/or square frame and process flow diagram and/or block scheme and/or square frame.These computer program instructions can being provided to the processor of multi-purpose computer, special purpose computer, Embedded Processor or other programmable data processing device to produce a machine, making the instruction performed by the processor of computing machine or other programmable data processing device produce device for realizing the function of specifying in process flow diagram flow process or multiple flow process and/or block scheme square frame or multiple square frame.
These computer program instructions also can be stored in can in the computer-readable memory that works in a specific way of vectoring computer or other programmable data processing device, the instruction making to be stored in this computer-readable memory produces the manufacture comprising command device, and this command device realizes the function of specifying in process flow diagram flow process or multiple flow process and/or block scheme square frame or multiple square frame.
These computer program instructions also can be loaded in computing machine or other programmable data processing device, make on computing machine or other programmable devices, to perform sequence of operations step to produce computer implemented process, thus the instruction performed on computing machine or other programmable devices is provided for the step realizing the function of specifying in process flow diagram flow process or multiple flow process and/or block scheme square frame or multiple square frame.
Although describe the preferred embodiments of the present invention, those skilled in the art once obtain the basic creative concept of cicada, then can make other change and amendment to these embodiments.So claims are intended to be interpreted as comprising preferred embodiment and falling into all changes and the amendment of the scope of the invention.

Claims (16)

1. based on a method for stream processing of Storm, it is characterized in that, described method comprises:
Extract data;
Obtain configuration file, resolve the topological parameter in described configuration file, described topological parameter is for building topological operation;
According to described topological parameter, the operation of described data construct topology;
Described topological operation is carried out the process of Storm stream.
2. method according to claim 1, is characterized in that, before described extraction data, also comprises:
Obtain the parameter of each first data source, according to each first data source of described parameter encapsulation;
Described extraction data, comprising:
Obtain the second data source of user storage data, described second data source is one in described first data source;
The second data source according to encapsulation extracts described data.
3. method according to claim 2, is characterized in that, described second data source is Hadoop distributed file system HDFS data source.
4. the method according to the arbitrary claim of claims 1 to 3, is characterized in that, after described extraction data, also comprises:
By described data stored in storage medium, with by described data encapsulation persistence.
5. method according to claim 4, is characterized in that, described by described data stored in storage medium, comprising:
By described data stored in HBase and/or Redis.
6. method according to claim 1, is characterized in that, described acquisition configuration file, resolves the topological parameter in described configuration file, comprising:
Obtain the configuration file that user provides, resolve the assembly default parameters in described configuration file and the privately owned parameter of operation, form topological parameter according to described assembly default parameters and the privately owned parameter of operation.
7. method according to claim 1, is characterized in that, described according to described topological parameter, the operation of described data construct topology, comprising:
Resolve User Defined business;
According to described topological parameter, described data, described self-defined business, build topological operation.
8. based on a stream processing system of Storm, it is characterized in that, described system comprises: data access general module, configuration file parsing module and Storm flow processing module;
Described data access general module, for extracting data;
Described configuration file parsing module, for obtaining configuration file, resolves the topological parameter in described configuration file, and described topological parameter is for building topological operation;
Described Storm flows processing module, for the data construct topology operation that the topological parameter obtained according to described configuration file parsing module, described data access general module extract, described topological Hand up homework to Storm cluster is carried out stream process.
9. system according to claim 8, is characterized in that, described data access general module, also for, obtain the parameter of each first data source, according to each first data source of described parameter encapsulation; Obtain the second data source of user storage data, described second data source is one in described first data source; The second data source according to encapsulation extracts described data.
10. system according to claim 9, is characterized in that, described data access general module, comprising: Kafka-Spout general purpose module and HDFS-Spout general purpose module;
Described HDFS-Spout general purpose module, for when the second data source obtaining user storage data is Hadoop distributed file system HDFS data source, the HDFS data source according to encapsulation extracts described data.
11. systems according to claim 10, is characterized in that, described HDFS-Spout general purpose module, also for realizing breakpoint transmission by ZooKeeper;
And/or realize failure retransfer by ZooKeeper.
System described in 12. according to Claim 8 to 11 arbitrary claims, described system, also comprises:
Data store general module, for by described data stored in storage medium, with by described data encapsulation persistence.
13. systems according to claim 12, is characterized in that, described data store general module, comprising: HBase-Bolt general purpose module and/or Redis-Bolt general purpose module;
Described HBase-Bolt general purpose module, for by data stored in HBase;
Described Redis-Bolt general purpose module, for by data stored in Redis.
14. systems according to claim 8, it is characterized in that, described configuration file parsing module, specifically for obtaining the configuration file that user provides, resolve the assembly default parameters in described configuration file and the privately owned parameter of operation, form topological parameter according to described assembly default parameters and the privately owned parameter of operation.
15. systems according to claim 8, is characterized in that, described Storm flows processing module, comprising: topological job pack assembly, Hand up homework assembly and Storm cluster;
Described topological job pack assembly, for according to described topological parameter, described data construct topology operation;
Described Hand up homework assembly, for by the topological Hand up homework of described topological job pack component construction to Storm cluster;
Described Storm cluster, is dispensed to processing node for the topological operation submitted to by described Hand up homework assembly, carries out stream process.
16. systems according to claim 15, is characterized in that, described Storm flows processing module, also comprises: self-defined parsing Service Component;
Described self-defined parsing Service Component, for resolving User Defined business;
Described topological job pack assembly, for the self-defined business obtained according to described topological parameter, described data, described self-defined parsing Service Component, builds topological operation.
CN201510896623.6A 2015-12-08 2015-12-08 Storm based stream processing method and system Pending CN105574082A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510896623.6A CN105574082A (en) 2015-12-08 2015-12-08 Storm based stream processing method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510896623.6A CN105574082A (en) 2015-12-08 2015-12-08 Storm based stream processing method and system

Publications (1)

Publication Number Publication Date
CN105574082A true CN105574082A (en) 2016-05-11

Family

ID=55884213

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510896623.6A Pending CN105574082A (en) 2015-12-08 2015-12-08 Storm based stream processing method and system

Country Status (1)

Country Link
CN (1) CN105574082A (en)

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106202503A (en) * 2016-07-19 2016-12-07 北京百分点信息科技有限公司 Data processing method and device
CN106250571A (en) * 2016-10-11 2016-12-21 北京集奥聚合科技有限公司 The method and system that a kind of ETL data process
CN106649119A (en) * 2016-12-28 2017-05-10 深圳市华傲数据技术有限公司 Stream computing engine testing method and device
CN107506482A (en) * 2017-06-26 2017-12-22 湖南星汉数智科技有限公司 A kind of large-scale data processing unit and method based on Stream Processing framework
CN107678852A (en) * 2017-10-26 2018-02-09 携程旅游网络技术(上海)有限公司 Method, system, equipment and the storage medium calculated in real time based on flow data
CN107885881A (en) * 2017-11-29 2018-04-06 顺丰科技有限公司 Business datum real-time report, acquisition methods, device, equipment and its storage medium
CN107944293A (en) * 2017-11-20 2018-04-20 上海携程商务有限公司 Fictitious assets guard method, system, equipment and storage medium
CN107958049A (en) * 2017-11-28 2018-04-24 航天科工智慧产业发展有限公司 A kind of quality of data checking and administration system
CN108156009A (en) * 2016-12-06 2018-06-12 北京金山云网络技术有限公司 A kind of service calling method and device
CN108255628A (en) * 2016-12-29 2018-07-06 北京国双科技有限公司 A kind of data processing method and device
CN108287854A (en) * 2017-01-10 2018-07-17 网宿科技股份有限公司 The method and system of data persistence in a kind of stream calculation
CN108494600A (en) * 2018-03-30 2018-09-04 努比亚技术有限公司 Topology creates the method, apparatus and storage medium of management and control
CN109359109A (en) * 2018-08-23 2019-02-19 阿里巴巴集团控股有限公司 A kind of data processing method and system calculated based on distributed stream
CN109726004A (en) * 2017-10-27 2019-05-07 中移(苏州)软件技术有限公司 A kind of data processing method and device
CN110019369A (en) * 2017-12-31 2019-07-16 中国移动通信集团福建有限公司 Method, apparatus, equipment and the medium of shared data stream process topology
CN111522637A (en) * 2020-04-14 2020-08-11 重庆邮电大学 Storm task scheduling method based on cost benefit
CN111597058A (en) * 2020-04-17 2020-08-28 微梦创科网络科技(中国)有限公司 Data stream processing method and system
CN112363774A (en) * 2020-11-06 2021-02-12 苏宁云计算有限公司 Storm real-time task configuration method and device
CN114116065A (en) * 2021-11-29 2022-03-01 中电金信软件有限公司 Method and device for acquiring topological graph data object and electronic equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080256549A1 (en) * 2007-04-10 2008-10-16 International Business Machines Corporation System and Method of Planning for Cooperative Information Processing
CN104050261A (en) * 2014-06-16 2014-09-17 深圳先进技术研究院 Stormed-based variable logic general data processing system and method
CN104052804A (en) * 2014-06-09 2014-09-17 深圳先进技术研究院 Method, device and cluster for sharing data streams between different task topologies

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080256549A1 (en) * 2007-04-10 2008-10-16 International Business Machines Corporation System and Method of Planning for Cooperative Information Processing
CN104052804A (en) * 2014-06-09 2014-09-17 深圳先进技术研究院 Method, device and cluster for sharing data streams between different task topologies
CN104050261A (en) * 2014-06-16 2014-09-17 深圳先进技术研究院 Stormed-based variable logic general data processing system and method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
周小勇 等: "基于数据流的实时网络流量分析系统设计与实现", 《计算机应用研究》 *
孙朝华: "基于Storm的数据分析系统设计与实现", 《中国优秀硕士学位论文全文数据库信息科技辑》 *

Cited By (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106202503B (en) * 2016-07-19 2019-08-16 北京百分点信息科技有限公司 Data processing method and device
CN106202503A (en) * 2016-07-19 2016-12-07 北京百分点信息科技有限公司 Data processing method and device
CN106250571A (en) * 2016-10-11 2016-12-21 北京集奥聚合科技有限公司 The method and system that a kind of ETL data process
CN108156009B (en) * 2016-12-06 2021-03-26 北京金山云网络技术有限公司 Service calling method and device
CN108156009A (en) * 2016-12-06 2018-06-12 北京金山云网络技术有限公司 A kind of service calling method and device
CN106649119A (en) * 2016-12-28 2017-05-10 深圳市华傲数据技术有限公司 Stream computing engine testing method and device
CN106649119B (en) * 2016-12-28 2019-09-20 深圳市华傲数据技术有限公司 The test method and device of stream calculation engine
CN108255628A (en) * 2016-12-29 2018-07-06 北京国双科技有限公司 A kind of data processing method and device
CN108287854B (en) * 2017-01-10 2021-06-22 网宿科技股份有限公司 Method and system for data persistence in stream calculation
CN108287854A (en) * 2017-01-10 2018-07-17 网宿科技股份有限公司 The method and system of data persistence in a kind of stream calculation
CN107506482A (en) * 2017-06-26 2017-12-22 湖南星汉数智科技有限公司 A kind of large-scale data processing unit and method based on Stream Processing framework
CN107678852A (en) * 2017-10-26 2018-02-09 携程旅游网络技术(上海)有限公司 Method, system, equipment and the storage medium calculated in real time based on flow data
CN107678852B (en) * 2017-10-26 2021-06-22 携程旅游网络技术(上海)有限公司 Method, system, equipment and storage medium based on stream data real-time calculation
CN109726004A (en) * 2017-10-27 2019-05-07 中移(苏州)软件技术有限公司 A kind of data processing method and device
CN109726004B (en) * 2017-10-27 2021-12-03 中移(苏州)软件技术有限公司 Data processing method and device
CN107944293A (en) * 2017-11-20 2018-04-20 上海携程商务有限公司 Fictitious assets guard method, system, equipment and storage medium
CN107944293B (en) * 2017-11-20 2019-09-24 上海携程商务有限公司 Fictitious assets guard method, system, equipment and storage medium
CN107958049A (en) * 2017-11-28 2018-04-24 航天科工智慧产业发展有限公司 A kind of quality of data checking and administration system
CN107885881A (en) * 2017-11-29 2018-04-06 顺丰科技有限公司 Business datum real-time report, acquisition methods, device, equipment and its storage medium
CN110019369A (en) * 2017-12-31 2019-07-16 中国移动通信集团福建有限公司 Method, apparatus, equipment and the medium of shared data stream process topology
CN110019369B (en) * 2017-12-31 2022-06-07 中国移动通信集团福建有限公司 Method, apparatus, device and medium for sharing data stream processing topology
CN108494600A (en) * 2018-03-30 2018-09-04 努比亚技术有限公司 Topology creates the method, apparatus and storage medium of management and control
CN108494600B (en) * 2018-03-30 2022-12-23 大唐丘北风电有限责任公司 Topology creation control method, device and storage medium
CN109359109A (en) * 2018-08-23 2019-02-19 阿里巴巴集团控股有限公司 A kind of data processing method and system calculated based on distributed stream
CN109359109B (en) * 2018-08-23 2022-05-27 创新先进技术有限公司 Data processing method and system based on distributed stream computing
CN111522637A (en) * 2020-04-14 2020-08-11 重庆邮电大学 Storm task scheduling method based on cost benefit
CN111522637B (en) * 2020-04-14 2024-03-29 深圳市凌晨知识产权运营有限公司 Method for scheduling storm task based on cost effectiveness
CN111597058A (en) * 2020-04-17 2020-08-28 微梦创科网络科技(中国)有限公司 Data stream processing method and system
CN111597058B (en) * 2020-04-17 2023-10-17 微梦创科网络科技(中国)有限公司 Data stream processing method and system
CN112363774A (en) * 2020-11-06 2021-02-12 苏宁云计算有限公司 Storm real-time task configuration method and device
CN114116065A (en) * 2021-11-29 2022-03-01 中电金信软件有限公司 Method and device for acquiring topological graph data object and electronic equipment

Similar Documents

Publication Publication Date Title
CN105574082A (en) Storm based stream processing method and system
EP3564829B1 (en) A modified representational state transfer (rest) application programming interface (api) including a customized graphql framework
US10353913B2 (en) Automating extract, transform, and load job testing
CN110134674B (en) Currency credit big data monitoring and analyzing system
CN110716744A (en) Data stream processing method, system and computer readable storage medium
US9992269B1 (en) Distributed complex event processing
CN105653425A (en) Complicated event processing engine based monitoring system
CN103618762A (en) System and method for enterprise service bus state pretreatment based on AOP
US20210382775A1 (en) Systems and methods for classifying and predicting the cause of information technology incidents using machine learning
US11461288B2 (en) Systems and methods for database management system (DBMS) discovery
CN104239508A (en) Data query method and data query device
EP3567804B1 (en) Advanced insights explorer
CN113836237A (en) Method and device for auditing data operation of database
CN116755799A (en) Service arrangement system and method
CN108282347A (en) A kind of server data online management method and system
CN110209722A (en) A kind of data-interface for data exchange
CN111526028A (en) Data processing method, device and equipment
WO2021227636A1 (en) Microservice processing method and apparatus, storage medium, and electronic device
Tisbeni et al. A Big Data Platform for heterogeneous data collection and analysis in large-scale data centres
US10824432B2 (en) Systems and methods for providing multiple console sessions that enable line-by-line execution of scripts on a server application
CN115348325B (en) Multichannel real-time transmission priority management and control method and system
CN104199979A (en) Modeled data source management system and method thereof
CN117873814A (en) Full-link data monitoring method, system, storage medium and electronic equipment
CN116069316A (en) PaaS platform
CN103220184B (en) Application system data are unified gathers synchro system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20160511