CN105574082A - Storm based stream processing method and system - Google Patents
Storm based stream processing method and system Download PDFInfo
- Publication number
- CN105574082A CN105574082A CN201510896623.6A CN201510896623A CN105574082A CN 105574082 A CN105574082 A CN 105574082A CN 201510896623 A CN201510896623 A CN 201510896623A CN 105574082 A CN105574082 A CN 105574082A
- Authority
- CN
- China
- Prior art keywords
- data
- topological
- parameter
- module
- configuration file
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24568—Data stream processing; Continuous queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2216/00—Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
- G06F2216/03—Data mining
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a Storm based stream processing method and system, and belongs to the technical field of data processing. The method comprises the steps of extracting data; obtaining a configuration file and analyzing a topological parameter in the configuration file, wherein the topological parameter is used for constructing a topological job; according to the topological parameter and the data, constructing the topological job; and performing Storm stream processing on the topological job. The system comprises a universal data access module used for extracting the data, a configuration file analysis module used for obtaining the configuration file and analyzing the topological parameter in the configuration file, and a Storm stream processing module used for constructing the topological job according to the topological parameter and a database and submitting the topological job to a Storm cluster for stream processing. According to the Storm based stream processing system, the data stored by a user and the configuration file provided by the user can be automatically obtained and the generated topological job is submitted to the Storm cluster for stream processing, so that general purpose modules and job configuration assembly in a whole stream processing frame can be realized and the stream processing job development process can be simplified and accelerated.
Description
Technical field
The present invention relates to technical field of data processing, particularly relate to the method for stream processing based on Storm and system.
Background technology
Data mining by the integration to mass data, analysis, statistics, can make full use of the value contained in data, excavates the information more useful to business administration, business improvement, business opportunity seizure etc.Along with the arrival of large data age, the data volume of data mining increases thereupon.Stream treatment technology can carry out real-time analysis to large data, has the very strong time value.
The performance history calculated due to streaming is complicated, and Storm provides the interface of Spout, Bolt, Toplogy and Data Serialization, and data mining personnel can be facilitated to develop corresponding stream processing system based on Storm.
But Storm introduction is more difficult, do not want have to Storm the data, services personnel understood in depth for only wanting to use Storm process oneself service related data, the development of Storm becomes the major obstacle that it uses stream treatment technology.
Summary of the invention
For solving the problem, the embodiment of the present invention proposes a kind of method for stream processing based on Storm and system.
On the one hand, embodiments provide a kind of method for stream processing based on Storm, described method comprises:
Extract data;
Obtain configuration file, resolve the topological parameter in described configuration file, described topological parameter is for building topological operation;
According to described topological parameter, the operation of described data construct topology;
Described topological operation is carried out the process of Storm stream.
Alternatively, before described extraction data, also comprise:
Obtain the parameter of each first data source, according to each first data source of described parameter encapsulation;
Described extraction data, comprising:
Obtain the second data source of user storage data, described second data source is one in described first data source;
The second data source according to encapsulation extracts described data.
Alternatively, described second data source is Hadoop distributed file system HDFS data source.
Alternatively, after described extraction data, also comprise:
By described data stored in storage medium, with by described data encapsulation persistence.
Alternatively, described by described data stored in storage medium, comprising:
By described data stored in HBase and/or Redis.
Alternatively, described acquisition configuration file, resolve the topological parameter in described configuration file, comprising:
Obtain the configuration file that user provides, resolve the assembly default parameters in described configuration file and the privately owned parameter of operation, form topological parameter according to described assembly default parameters and the privately owned parameter of operation.
Alternatively, described according to described topological parameter, the operation of described data construct topology, comprising:
Resolve User Defined business;
According to described topological parameter, described data, described self-defined business, build topological operation.
On the other hand, embodiments provide a kind of stream processing system based on Storm, described system comprises: data access general module, configuration file parsing module and Storm flow processing module;
Described data access general module, for extracting data;
Described configuration file parsing module, for obtaining configuration file, resolves the topological parameter in described configuration file, and described topological parameter is for building topological operation;
Described Storm flows processing module, for the data construct topology operation that the topological parameter obtained according to described configuration file parsing module, described data access general module extract, described topological Hand up homework to Storm cluster is carried out stream process.
Alternatively, described data access general module, also for, obtain the parameter of each first data source, according to each first data source of described parameter encapsulation; Obtain the second data source of user storage data, described second data source is one in described first data source; The second data source according to encapsulation extracts described data.
Alternatively, described data access general module, comprising: Kafka-Spout general purpose module and HDFS-Spout general purpose module;
Described HDFS-Spout general purpose module, for when the second data source obtaining user storage data is Hadoop distributed file system HDFS data source, the HDFS data source according to encapsulation extracts described data.
Alternatively, described HDFS-Spout general purpose module, also for realizing breakpoint transmission by ZooKeeper;
And/or realize failure retransfer by ZooKeeper.
Alternatively, described system, also comprises:
Data store general module, for by described data stored in storage medium, with by described data encapsulation persistence.
Alternatively, described data store general module, comprising: HBase-Bolt general purpose module and/or Redis-Bolt general purpose module;
Described HBase-Bolt general purpose module, for by data stored in HBase;
Described Redis-Bolt general purpose module, for by data stored in Redis.
Alternatively, described configuration file parsing module, specifically for obtaining the configuration file that user provides, resolves the assembly default parameters in described configuration file and the privately owned parameter of operation, forms topological parameter according to described assembly default parameters and the privately owned parameter of operation.
Alternatively, described Storm flows processing module, comprising: topological job pack assembly, Hand up homework assembly and Storm cluster;
Described topological job pack assembly, for according to described topological parameter, described data construct topology operation;
Described Hand up homework assembly, for by the topological Hand up homework of described topological job pack component construction to Storm cluster;
Described Storm cluster, is dispensed to processing node for the topological operation submitted to by described Hand up homework assembly, carries out stream process.
Alternatively, described Storm flows processing module, also comprises: self-defined parsing Service Component;
Described self-defined parsing Service Component, for resolving User Defined business;
Described topological job pack assembly, for the self-defined business obtained according to described topological parameter, described data, described self-defined parsing Service Component, builds topological operation.
Beneficial effect is as follows:
User only needs store data and provide configuration file, what the present embodiment provided flows by data access general module, data storage general module, configuration file parsing module and Storm the stream processing system based on Storm that processing module forms, can automatically according to data and configuration file, generate topological Hand up homework to Storm cluster and carry out stream process, reduce user and use the technical difficulty of carrying out flowing process by Storm, realize the general purpose module in whole stream process framework and operation configuration assembling, simplify, accelerate stream processing operation performance history.
Accompanying drawing explanation
Below with reference to accompanying drawings specific embodiments of the invention are described, wherein:
Fig. 1 shows the first process flow diagram based on the method for stream processing of Storm provided in an embodiment of the present invention;
Fig. 2 shows the first structural representation based on the stream processing system of Storm provided in an embodiment of the present invention;
Fig. 3 shows the second of providing in the another kind of embodiment of the present invention structural representation based on the stream processing system of Storm;
Fig. 4 shows a kind of DFS-Spout general purpose module treatment scheme schematic diagram provided in the another kind of embodiment of the present invention;
Fig. 5 shows the third structural representation based on the stream processing system of Storm provided in the another kind of embodiment of the present invention;
Fig. 6 shows the 4th kind of structural representation based on the stream processing system of Storm provided in the another kind of embodiment of the present invention;
Fig. 7 shows a kind of HBase-Bolt general purpose module treatment scheme schematic diagram provided in the another kind of embodiment of the present invention;
Fig. 8 shows a kind of Redis-Bolt general purpose module treatment scheme schematic diagram provided in the another kind of embodiment of the present invention;
Fig. 9 shows the 5th kind of structural representation based on the stream processing system of Storm provided in the another kind of embodiment of the present invention;
Figure 10 shows the 6th kind of structural representation based on the stream processing system of Storm provided in the another kind of embodiment of the present invention.
Embodiment
Clearly understand to make technical scheme of the present invention and advantage, below in conjunction with accompanying drawing, exemplary embodiment of the present invention is described in more detail, obviously, described embodiment is only a part of embodiment of the present invention, instead of all embodiments is exhaustive.And when not conflicting, the embodiment in this explanation and the feature in embodiment can be combined with each other.
Storm provides Spout, Bolt, the interface of Toplogy and Data Serialization, data mining personnel can be facilitated to develop corresponding stream processing system based on Storm, but Storm introduction is more difficult, do not want have to Storm the data, services personnel understood in depth for only wanting to use Storm process oneself service related data, the development of Storm becomes the major obstacle that it uses stream treatment technology, the present invention proposes a kind of method for stream processing based on Storm, the method is applied to a kind of system, should based on the stream processing system based on Storm as described in the embodiment shown in the stream process figure as arbitrary in following Fig. 2 to Fig. 9 of Storm.This system user only needs store data and provide configuration file, automatically according to data and configuration file, can generate topological Hand up homework to Storm cluster and carries out stream process, reduce user and use the technical difficulty of carrying out flowing process by Storm.
In conjunction with above-mentioned implementation environment, embodiment shown in Figure 1, present embodiments provide a kind of querying method of this parallel memorizing of two-pack medium, the method flow that the present embodiment provides is specific as follows:
101: extract data;
102: obtain configuration file, resolve the topological parameter in configuration file, topological parameter is for building topological operation;
Alternatively, obtain configuration file, resolve the topological parameter in configuration file, comprising:
Obtain the configuration file that user provides, resolve the assembly default parameters in configuration file and the privately owned parameter of operation, form topological parameter according to assembly default parameters and the privately owned parameter of operation.
103: according to topological parameter, the operation of data construct topology;
Alternatively, according to topological parameter, the operation of data construct topology, comprising:
Resolve User Defined business;
According to topological parameter, data, self-defined business, build topological operation.
104: topological operation is carried out the process of Storm stream.
Alternatively, before extracting data, also comprise:
Obtain the parameter of each first data source, according to each first data source of parameter encapsulation;
Extract data, comprising:
Obtain the second data source of user storage data, the second data source is one in the first data source;
The second data source according to encapsulation extracts data.
Alternatively, the second data source is HDFS (HadoopDistributedFileSystem, Hadoop distributed file system) data source.
Alternatively, after extracting data, also comprise:
By data stored in storage medium, to encapsulate data persistence.
Alternatively, by data stored in storage medium, comprising:
By data stored in HBase and/or Redis.
Beneficial effect is as follows:
User only needs store data and provide configuration file, what the present embodiment provided flows by data access general module, data storage general module, configuration file parsing module and Storm the stream processing system based on Storm that processing module forms, can automatically according to data and configuration file, generate topological Hand up homework to Storm cluster and carry out stream process, reduce user and use the technical difficulty of carrying out flowing process by Storm, realize the general purpose module in whole stream process framework and operation configuration assembling, simplify, accelerate stream processing operation performance history.
In conjunction with above-mentioned implementation environment, embodiment shown in Figure 2, present embodiments provides a kind of stream processing system based on Storm:
This system comprises: data access general module 201, configuration file parsing module 202 and Storm flow processing module 203;
● data access general module 201, for extracting data;
In practical application, data access general module 201, before extracting the data, also for obtaining the parameter of each first data source, according to each first data source of described parameter encapsulation;
Such as, obtain the access parameter of each first data source, according to the access of each first data source of described parameter encapsulation.
It should be noted that, obtain the parameter of each first data source, according to the process that the process of each first data source of described parameter encapsulation not all performs at every turn, only carrying out stream process in first time by the system that the present embodiment provides is perform this process, or, perform this process when there being new data source to occur, all data sources to be encapsulated, the system that simplifying the present embodiment provides carries out data access flow process when flowing process.The present embodiment does not specifically limit the trigger condition performing this process.
Concrete, data access general module 201 obtains the second data source of user storage data, and this second data source is one in the first data source; The second data source according to encapsulation extracts data.
In addition, data can be stored in data source by user in the form of a file, such as: by all data by a certain criteria classification, uniform data are stored in identical file, are stored in by All Files in same or different pieces of information source.Data access general module 201, can obtain the second data source of user storage data; According to the second data source file reading of encapsulation, from file, extract data.
For convenience of description, if the present embodiment is without specified otherwise, all stores data instance in the form of a file with user and be described.
As shown in Figure 3, data access general module 201, comprising: Kafka-Spout general purpose module 2011 and HDFS-Spout general purpose module 2012;
Kafka-Spout general purpose module 2011 is identical with existing Kafka-Spout assembly function, is not described in detail herein.
HDFS-Spout general purpose module 2012, for when the second data source obtaining user storage data is Hadoop distributed file system HDFS data source, the HDFS data source according to encapsulation extracts data.
Concrete, in nextTuple, from document queue, take out a file, according to the compressed format of file suffixes identification file, then set up corresponding inlet flow passage, with given character string separate data record, read and record and export.
In addition, when each intermodule transmission, may there is the phenomenon of interruption in data, such as, and network interruption, or artificially interrupt.In order to avoid in have no progeny data retransmission and the wasting of resources caused, HDFS-Spout general purpose module 2012, also realizes breakpoint transmission by ZooKeeper.
, can file directory be formed for the All Files transmitted below, describe the specific implementation of breakpoint transmission in detail.
1) in Open, carry out Spout initialization, obtain from zookeeper and send the historical record of certain file directory, (namely data storage general module 202 reads the historical record of this file directory), generate a pending document queue.
2) whether have newly-increased file under checking the memory location of HDFS, this fileinfo <filePath, position, ended>, be recorded in zookeeper.
Wherein, filePath is the memory location of file, and position is the reference position of transmission, and ended is the final position of transmission.
In addition, the phenomenon of data transmission fails may be there is in data actual transmissions process, such as, be interfered in transmitting procedure.Cause in order to avoid data transmission fails data not complete, HDFS-Spout general purpose module 2012, also for when the data of extraction being sent to data and storing general module 202, realize failure retransfer by ZooKeeper.
, describe the specific implementation of failure retransfer in detail: after each file runs through below, close inlet flow, upgrade its information in zookeeper.Start a timed task, every going the time to have checked whether that new file produces, just add document queue if had, idiographic flow as shown in Figure 4 simultaneously.
HDFS-Spout general purpose module 2012, extracts the data of Text type from HDFS path, to make it to flow in the computing unit of Storm cluster.HDFS-Spout general purpose module 2012 realizes the file access under assigned catalogue, ensure that all records of file are all sent on Storm stream by the synergistic mechanism of ZooKeeper, support breakpoint transmission, the failure retransfer mechanism of supported data in whole tupletree.
In addition, in order to by data persistence, need to encapsulate data on a different storage medium, alternatively, see Fig. 5, the system that the present embodiment provides, also comprises:
● data store general module 204, for the data extracted by data access general module 201 stored in storage medium, to encapsulate data persistence;
Wherein, storage medium includes but not limited to database.
Data store general module 204, comprise: HBase-Bolt general purpose module 2041 and/or Redis-Bolt general purpose module 2042, Fig. 6 illustrates that data store general module 204, comprising: the situation of HBase-Bolt general purpose module 2041 and Redis-Bolt general purpose module 2042;
1) HBase-Bolt general purpose module 2041, for by data stored in HBase;
HBase-Bolt general purpose module 2041 provides the function writing data to HBase.Each Bolt only to tables of data write data, and detects HBase storage medium and whether occurs exception, if abnormal, the next Bolt that imports into of data security is carried out subsequent treatment.
Input data are the data that can be converted into Put, and the previous Bolt/Spout of this Bolt needs data layout data assembling being become can be converted into Put.
Concrete, the main flow writing data to HBase is divided into three parts: the assignment of parameter; The initialization of resource; Receive Tuple and write HBase.
Wherein, parameter initialization is the generation phase at Topology, and parametric variable must be fundamental type, and Storm flows processing module 203 can this argument sequence file transfer to each work node.
The initialization of HBase resource performs on each node, and namely perform prepare method, the actual treatment of Tuple is in execute method, and idiographic flow as shown in Figure 7.
2) Redis-Bolt general purpose module 2042, for by data stored in Redis.
Redis-Bolt general purpose module 2042 provides the function writing fundamental type data to Redis.The value Value Types of these data comprises: character string, hashed value, set, ordered set, list.
During concrete enforcement, include but not limited to: the Tuple form of input is (type of key, value, value uses function), and the primary function of support is:
Concrete, the main flow writing fundamental type data to Redis is divided into three parts: the assignment of parameter; The initialization of Redis resource; Receive Tuple, call corresponding function, write Redis.
Wherein, parameter initialization is the generation phase at Topology, and parametric variable must be fundamental type, and Storm platform can these argument sequence file transfer to each work node.Parameter includes but not limited to: redis.dbID, redis.master.ip, redis.master.port, redis.maxActive, redis.maxIdle, redis.maxWait, redis.testOnBorrow.
The initialization of Redis resource realizes in prepare, comprises and generates Redis connection pool according to parameter, generate client's side link object.
The main processing logic writing fundamental type data to Redis through row, first accepts tuple, obtains the handling function of record data, data type, use in Execute method; Then corresponding handling function is called by data record write redis.Idiographic flow as shown in Figure 8.
● configuration file parsing module 202, for obtaining configuration file, resolve the topological parameter in configuration file, topological parameter is for building topological operation;
Configuration file parsing module 202, specifically for obtaining the configuration file that user provides, resolves the assembly default parameters in configuration file and the privately owned parameter of operation, forms topological parameter according to assembly default parameters and the privately owned parameter of operation.
The function of configuration file parsing module 202 is resolved by user configured configuration file (such as: topology-x.xml), generates the information carrier of topology (topology), i.e. topological parameter.
The resolving of configuration file parsing module 202 is mainly divided into two steps: resolution component default parameters; Resolve the privately owned parameter of operation.
Wherein, the privately owned parameter of operation is also named topology operation peculiar parameter, can cover total parameter.
The default configuration that resolution component default parameters generates Topology-storm includes but not limited to: kafka-spout, text-hdfs-spout, avro-hdfs-spout, text-hdfs-bolt, avro-hdfs-bolt, hbase-bolt, redis-bolt, custom-bolt.Module information is only had, non-structure information in these configurations.
Need covering assemblies default parameters during the privately owned Parameter analysis of electrochemical of operation, generate and may be used for the parameter object building topological operation, i.e. topological parameter.This topological parameter uses when assembling topology.
● Storm flows processing module 203, for obtain according to configuration file parsing module 202 topological parameter, data access general module 201 extract data construct topology operation, topological Hand up homework to Storm cluster is carried out stream process.
Flow processing module 203 see Fig. 9, Storm, comprising: topological job pack assembly 2031, Hand up homework assembly 2032 and Storm cluster 2033;
Topology job pack assembly 2031, the data construct topology operation that topological parameter, data access general module 201 for obtaining according to configuration file parsing module 202 extract;
Hand up homework assembly 2032, for the topological Hand up homework that built by topological job pack assembly 2031 to Storm cluster 2033;
Storm cluster 2033, is dispensed to processing node for the topological operation submitted to by Hand up homework assembly 2032, carries out stream process.
In addition, in order to reduce the use threshold of user further, promote the versatility of system that the present embodiment provides, system, except the business that can arrange in processing configuration file, also can process User Defined business.User Defined business specifically Storm as shown in Figure 10 can flow the self-defined parsing Service Component 2034 also comprised in processing module 203 and realizes.
Wherein, self-defined parsing Service Component 2034, for resolving User Defined business;
Topology job pack assembly 2031, the self-defined business that the data that topological parameter, data access general module 201 for obtaining according to configuration file parsing module 202 extract, self-defined parsing Service Component 2034 obtain, builds topological operation.
Beneficial effect:
User only needs store data and provide configuration file, what the present embodiment provided flows by data access general module, data storage general module, configuration file parsing module and Storm the stream processing system based on Storm that processing module forms, can automatically according to data and configuration file, generate topological Hand up homework to Storm cluster and carry out stream process, reduce user and use the technical difficulty of carrying out flowing process by Storm, realize the general purpose module in whole stream process framework and operation configuration assembling, simplify, accelerate stream processing operation performance history.
In addition, the system that the present embodiment provides, in data transmission procedure, also provides breakpoint transmission and/or failure retransfer function, effectively raises data transmission quality, reduces data transmission institute's cost source.
In above-described embodiment, existing Functional Unit device blocks all can be adopted to implement.Such as, processing module can adopt existing data processing components and parts, at least, the location-server adopted just possesses realize this Functional Unit device in existing location technology; As for receiver module, be then the components and parts that equipment that any one possesses signal transfer functions all possesses; Meanwhile, what the calculating of A, n parameter, intensity adjustment etc. that processing module is carried out adopted is all existing technological means, and those skilled in the art design and develop can realize through accordingly.
For convenience of description, each several part of the above device is divided into various module or unit to describe respectively with function.Certainly, the function of each module or unit can be realized in same or multiple software or hardware when implementing of the present invention.
Those skilled in the art should understand, embodiments of the invention can be provided as method, system or computer program.Therefore, the present invention can adopt the form of complete hardware embodiment, completely software implementation or the embodiment in conjunction with software and hardware aspect.And the present invention can adopt in one or more form wherein including the upper computer program implemented of computer-usable storage medium (including but not limited to magnetic disk memory, CD-ROM, optical memory etc.) of computer usable program code.
The present invention describes with reference to according to the process flow diagram of the method for the embodiment of the present invention, equipment (system) and computer program and/or block scheme.Should understand can by the combination of the flow process in each flow process in computer program instructions realization flow figure and/or block scheme and/or square frame and process flow diagram and/or block scheme and/or square frame.These computer program instructions can being provided to the processor of multi-purpose computer, special purpose computer, Embedded Processor or other programmable data processing device to produce a machine, making the instruction performed by the processor of computing machine or other programmable data processing device produce device for realizing the function of specifying in process flow diagram flow process or multiple flow process and/or block scheme square frame or multiple square frame.
These computer program instructions also can be stored in can in the computer-readable memory that works in a specific way of vectoring computer or other programmable data processing device, the instruction making to be stored in this computer-readable memory produces the manufacture comprising command device, and this command device realizes the function of specifying in process flow diagram flow process or multiple flow process and/or block scheme square frame or multiple square frame.
These computer program instructions also can be loaded in computing machine or other programmable data processing device, make on computing machine or other programmable devices, to perform sequence of operations step to produce computer implemented process, thus the instruction performed on computing machine or other programmable devices is provided for the step realizing the function of specifying in process flow diagram flow process or multiple flow process and/or block scheme square frame or multiple square frame.
Although describe the preferred embodiments of the present invention, those skilled in the art once obtain the basic creative concept of cicada, then can make other change and amendment to these embodiments.So claims are intended to be interpreted as comprising preferred embodiment and falling into all changes and the amendment of the scope of the invention.
Claims (16)
1. based on a method for stream processing of Storm, it is characterized in that, described method comprises:
Extract data;
Obtain configuration file, resolve the topological parameter in described configuration file, described topological parameter is for building topological operation;
According to described topological parameter, the operation of described data construct topology;
Described topological operation is carried out the process of Storm stream.
2. method according to claim 1, is characterized in that, before described extraction data, also comprises:
Obtain the parameter of each first data source, according to each first data source of described parameter encapsulation;
Described extraction data, comprising:
Obtain the second data source of user storage data, described second data source is one in described first data source;
The second data source according to encapsulation extracts described data.
3. method according to claim 2, is characterized in that, described second data source is Hadoop distributed file system HDFS data source.
4. the method according to the arbitrary claim of claims 1 to 3, is characterized in that, after described extraction data, also comprises:
By described data stored in storage medium, with by described data encapsulation persistence.
5. method according to claim 4, is characterized in that, described by described data stored in storage medium, comprising:
By described data stored in HBase and/or Redis.
6. method according to claim 1, is characterized in that, described acquisition configuration file, resolves the topological parameter in described configuration file, comprising:
Obtain the configuration file that user provides, resolve the assembly default parameters in described configuration file and the privately owned parameter of operation, form topological parameter according to described assembly default parameters and the privately owned parameter of operation.
7. method according to claim 1, is characterized in that, described according to described topological parameter, the operation of described data construct topology, comprising:
Resolve User Defined business;
According to described topological parameter, described data, described self-defined business, build topological operation.
8. based on a stream processing system of Storm, it is characterized in that, described system comprises: data access general module, configuration file parsing module and Storm flow processing module;
Described data access general module, for extracting data;
Described configuration file parsing module, for obtaining configuration file, resolves the topological parameter in described configuration file, and described topological parameter is for building topological operation;
Described Storm flows processing module, for the data construct topology operation that the topological parameter obtained according to described configuration file parsing module, described data access general module extract, described topological Hand up homework to Storm cluster is carried out stream process.
9. system according to claim 8, is characterized in that, described data access general module, also for, obtain the parameter of each first data source, according to each first data source of described parameter encapsulation; Obtain the second data source of user storage data, described second data source is one in described first data source; The second data source according to encapsulation extracts described data.
10. system according to claim 9, is characterized in that, described data access general module, comprising: Kafka-Spout general purpose module and HDFS-Spout general purpose module;
Described HDFS-Spout general purpose module, for when the second data source obtaining user storage data is Hadoop distributed file system HDFS data source, the HDFS data source according to encapsulation extracts described data.
11. systems according to claim 10, is characterized in that, described HDFS-Spout general purpose module, also for realizing breakpoint transmission by ZooKeeper;
And/or realize failure retransfer by ZooKeeper.
System described in 12. according to Claim 8 to 11 arbitrary claims, described system, also comprises:
Data store general module, for by described data stored in storage medium, with by described data encapsulation persistence.
13. systems according to claim 12, is characterized in that, described data store general module, comprising: HBase-Bolt general purpose module and/or Redis-Bolt general purpose module;
Described HBase-Bolt general purpose module, for by data stored in HBase;
Described Redis-Bolt general purpose module, for by data stored in Redis.
14. systems according to claim 8, it is characterized in that, described configuration file parsing module, specifically for obtaining the configuration file that user provides, resolve the assembly default parameters in described configuration file and the privately owned parameter of operation, form topological parameter according to described assembly default parameters and the privately owned parameter of operation.
15. systems according to claim 8, is characterized in that, described Storm flows processing module, comprising: topological job pack assembly, Hand up homework assembly and Storm cluster;
Described topological job pack assembly, for according to described topological parameter, described data construct topology operation;
Described Hand up homework assembly, for by the topological Hand up homework of described topological job pack component construction to Storm cluster;
Described Storm cluster, is dispensed to processing node for the topological operation submitted to by described Hand up homework assembly, carries out stream process.
16. systems according to claim 15, is characterized in that, described Storm flows processing module, also comprises: self-defined parsing Service Component;
Described self-defined parsing Service Component, for resolving User Defined business;
Described topological job pack assembly, for the self-defined business obtained according to described topological parameter, described data, described self-defined parsing Service Component, builds topological operation.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510896623.6A CN105574082A (en) | 2015-12-08 | 2015-12-08 | Storm based stream processing method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510896623.6A CN105574082A (en) | 2015-12-08 | 2015-12-08 | Storm based stream processing method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN105574082A true CN105574082A (en) | 2016-05-11 |
Family
ID=55884213
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510896623.6A Pending CN105574082A (en) | 2015-12-08 | 2015-12-08 | Storm based stream processing method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105574082A (en) |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106202503A (en) * | 2016-07-19 | 2016-12-07 | 北京百分点信息科技有限公司 | Data processing method and device |
CN106250571A (en) * | 2016-10-11 | 2016-12-21 | 北京集奥聚合科技有限公司 | The method and system that a kind of ETL data process |
CN106649119A (en) * | 2016-12-28 | 2017-05-10 | 深圳市华傲数据技术有限公司 | Stream computing engine testing method and device |
CN107506482A (en) * | 2017-06-26 | 2017-12-22 | 湖南星汉数智科技有限公司 | A kind of large-scale data processing unit and method based on Stream Processing framework |
CN107678852A (en) * | 2017-10-26 | 2018-02-09 | 携程旅游网络技术(上海)有限公司 | Method, system, equipment and the storage medium calculated in real time based on flow data |
CN107885881A (en) * | 2017-11-29 | 2018-04-06 | 顺丰科技有限公司 | Business datum real-time report, acquisition methods, device, equipment and its storage medium |
CN107944293A (en) * | 2017-11-20 | 2018-04-20 | 上海携程商务有限公司 | Fictitious assets guard method, system, equipment and storage medium |
CN107958049A (en) * | 2017-11-28 | 2018-04-24 | 航天科工智慧产业发展有限公司 | A kind of quality of data checking and administration system |
CN108156009A (en) * | 2016-12-06 | 2018-06-12 | 北京金山云网络技术有限公司 | A kind of service calling method and device |
CN108255628A (en) * | 2016-12-29 | 2018-07-06 | 北京国双科技有限公司 | A kind of data processing method and device |
CN108287854A (en) * | 2017-01-10 | 2018-07-17 | 网宿科技股份有限公司 | The method and system of data persistence in a kind of stream calculation |
CN108494600A (en) * | 2018-03-30 | 2018-09-04 | 努比亚技术有限公司 | Topology creates the method, apparatus and storage medium of management and control |
CN109359109A (en) * | 2018-08-23 | 2019-02-19 | 阿里巴巴集团控股有限公司 | A kind of data processing method and system calculated based on distributed stream |
CN109726004A (en) * | 2017-10-27 | 2019-05-07 | 中移(苏州)软件技术有限公司 | A kind of data processing method and device |
CN110019369A (en) * | 2017-12-31 | 2019-07-16 | 中国移动通信集团福建有限公司 | Method, apparatus, equipment and the medium of shared data stream process topology |
CN111522637A (en) * | 2020-04-14 | 2020-08-11 | 重庆邮电大学 | Storm task scheduling method based on cost benefit |
CN111597058A (en) * | 2020-04-17 | 2020-08-28 | 微梦创科网络科技(中国)有限公司 | Data stream processing method and system |
CN112363774A (en) * | 2020-11-06 | 2021-02-12 | 苏宁云计算有限公司 | Storm real-time task configuration method and device |
CN114116065A (en) * | 2021-11-29 | 2022-03-01 | 中电金信软件有限公司 | Method and device for acquiring topological graph data object and electronic equipment |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080256549A1 (en) * | 2007-04-10 | 2008-10-16 | International Business Machines Corporation | System and Method of Planning for Cooperative Information Processing |
CN104050261A (en) * | 2014-06-16 | 2014-09-17 | 深圳先进技术研究院 | Stormed-based variable logic general data processing system and method |
CN104052804A (en) * | 2014-06-09 | 2014-09-17 | 深圳先进技术研究院 | Method, device and cluster for sharing data streams between different task topologies |
-
2015
- 2015-12-08 CN CN201510896623.6A patent/CN105574082A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080256549A1 (en) * | 2007-04-10 | 2008-10-16 | International Business Machines Corporation | System and Method of Planning for Cooperative Information Processing |
CN104052804A (en) * | 2014-06-09 | 2014-09-17 | 深圳先进技术研究院 | Method, device and cluster for sharing data streams between different task topologies |
CN104050261A (en) * | 2014-06-16 | 2014-09-17 | 深圳先进技术研究院 | Stormed-based variable logic general data processing system and method |
Non-Patent Citations (2)
Title |
---|
周小勇 等: "基于数据流的实时网络流量分析系统设计与实现", 《计算机应用研究》 * |
孙朝华: "基于Storm的数据分析系统设计与实现", 《中国优秀硕士学位论文全文数据库信息科技辑》 * |
Cited By (31)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106202503B (en) * | 2016-07-19 | 2019-08-16 | 北京百分点信息科技有限公司 | Data processing method and device |
CN106202503A (en) * | 2016-07-19 | 2016-12-07 | 北京百分点信息科技有限公司 | Data processing method and device |
CN106250571A (en) * | 2016-10-11 | 2016-12-21 | 北京集奥聚合科技有限公司 | The method and system that a kind of ETL data process |
CN108156009B (en) * | 2016-12-06 | 2021-03-26 | 北京金山云网络技术有限公司 | Service calling method and device |
CN108156009A (en) * | 2016-12-06 | 2018-06-12 | 北京金山云网络技术有限公司 | A kind of service calling method and device |
CN106649119A (en) * | 2016-12-28 | 2017-05-10 | 深圳市华傲数据技术有限公司 | Stream computing engine testing method and device |
CN106649119B (en) * | 2016-12-28 | 2019-09-20 | 深圳市华傲数据技术有限公司 | The test method and device of stream calculation engine |
CN108255628A (en) * | 2016-12-29 | 2018-07-06 | 北京国双科技有限公司 | A kind of data processing method and device |
CN108287854B (en) * | 2017-01-10 | 2021-06-22 | 网宿科技股份有限公司 | Method and system for data persistence in stream calculation |
CN108287854A (en) * | 2017-01-10 | 2018-07-17 | 网宿科技股份有限公司 | The method and system of data persistence in a kind of stream calculation |
CN107506482A (en) * | 2017-06-26 | 2017-12-22 | 湖南星汉数智科技有限公司 | A kind of large-scale data processing unit and method based on Stream Processing framework |
CN107678852A (en) * | 2017-10-26 | 2018-02-09 | 携程旅游网络技术(上海)有限公司 | Method, system, equipment and the storage medium calculated in real time based on flow data |
CN107678852B (en) * | 2017-10-26 | 2021-06-22 | 携程旅游网络技术(上海)有限公司 | Method, system, equipment and storage medium based on stream data real-time calculation |
CN109726004A (en) * | 2017-10-27 | 2019-05-07 | 中移(苏州)软件技术有限公司 | A kind of data processing method and device |
CN109726004B (en) * | 2017-10-27 | 2021-12-03 | 中移(苏州)软件技术有限公司 | Data processing method and device |
CN107944293A (en) * | 2017-11-20 | 2018-04-20 | 上海携程商务有限公司 | Fictitious assets guard method, system, equipment and storage medium |
CN107944293B (en) * | 2017-11-20 | 2019-09-24 | 上海携程商务有限公司 | Fictitious assets guard method, system, equipment and storage medium |
CN107958049A (en) * | 2017-11-28 | 2018-04-24 | 航天科工智慧产业发展有限公司 | A kind of quality of data checking and administration system |
CN107885881A (en) * | 2017-11-29 | 2018-04-06 | 顺丰科技有限公司 | Business datum real-time report, acquisition methods, device, equipment and its storage medium |
CN110019369A (en) * | 2017-12-31 | 2019-07-16 | 中国移动通信集团福建有限公司 | Method, apparatus, equipment and the medium of shared data stream process topology |
CN110019369B (en) * | 2017-12-31 | 2022-06-07 | 中国移动通信集团福建有限公司 | Method, apparatus, device and medium for sharing data stream processing topology |
CN108494600A (en) * | 2018-03-30 | 2018-09-04 | 努比亚技术有限公司 | Topology creates the method, apparatus and storage medium of management and control |
CN108494600B (en) * | 2018-03-30 | 2022-12-23 | 大唐丘北风电有限责任公司 | Topology creation control method, device and storage medium |
CN109359109A (en) * | 2018-08-23 | 2019-02-19 | 阿里巴巴集团控股有限公司 | A kind of data processing method and system calculated based on distributed stream |
CN109359109B (en) * | 2018-08-23 | 2022-05-27 | 创新先进技术有限公司 | Data processing method and system based on distributed stream computing |
CN111522637A (en) * | 2020-04-14 | 2020-08-11 | 重庆邮电大学 | Storm task scheduling method based on cost benefit |
CN111522637B (en) * | 2020-04-14 | 2024-03-29 | 深圳市凌晨知识产权运营有限公司 | Method for scheduling storm task based on cost effectiveness |
CN111597058A (en) * | 2020-04-17 | 2020-08-28 | 微梦创科网络科技(中国)有限公司 | Data stream processing method and system |
CN111597058B (en) * | 2020-04-17 | 2023-10-17 | 微梦创科网络科技(中国)有限公司 | Data stream processing method and system |
CN112363774A (en) * | 2020-11-06 | 2021-02-12 | 苏宁云计算有限公司 | Storm real-time task configuration method and device |
CN114116065A (en) * | 2021-11-29 | 2022-03-01 | 中电金信软件有限公司 | Method and device for acquiring topological graph data object and electronic equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105574082A (en) | Storm based stream processing method and system | |
EP3564829B1 (en) | A modified representational state transfer (rest) application programming interface (api) including a customized graphql framework | |
US10353913B2 (en) | Automating extract, transform, and load job testing | |
CN110134674B (en) | Currency credit big data monitoring and analyzing system | |
CN110716744A (en) | Data stream processing method, system and computer readable storage medium | |
US9992269B1 (en) | Distributed complex event processing | |
CN105653425A (en) | Complicated event processing engine based monitoring system | |
CN103618762A (en) | System and method for enterprise service bus state pretreatment based on AOP | |
US20210382775A1 (en) | Systems and methods for classifying and predicting the cause of information technology incidents using machine learning | |
US11461288B2 (en) | Systems and methods for database management system (DBMS) discovery | |
CN104239508A (en) | Data query method and data query device | |
EP3567804B1 (en) | Advanced insights explorer | |
CN113836237A (en) | Method and device for auditing data operation of database | |
CN116755799A (en) | Service arrangement system and method | |
CN108282347A (en) | A kind of server data online management method and system | |
CN110209722A (en) | A kind of data-interface for data exchange | |
CN111526028A (en) | Data processing method, device and equipment | |
WO2021227636A1 (en) | Microservice processing method and apparatus, storage medium, and electronic device | |
Tisbeni et al. | A Big Data Platform for heterogeneous data collection and analysis in large-scale data centres | |
US10824432B2 (en) | Systems and methods for providing multiple console sessions that enable line-by-line execution of scripts on a server application | |
CN115348325B (en) | Multichannel real-time transmission priority management and control method and system | |
CN104199979A (en) | Modeled data source management system and method thereof | |
CN117873814A (en) | Full-link data monitoring method, system, storage medium and electronic equipment | |
CN116069316A (en) | PaaS platform | |
CN103220184B (en) | Application system data are unified gathers synchro system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20160511 |