CN108255913A - A kind of real-time streaming data processing method and processing device - Google Patents
A kind of real-time streaming data processing method and processing device Download PDFInfo
- Publication number
- CN108255913A CN108255913A CN201710773501.7A CN201710773501A CN108255913A CN 108255913 A CN108255913 A CN 108255913A CN 201710773501 A CN201710773501 A CN 201710773501A CN 108255913 A CN108255913 A CN 108255913A
- Authority
- CN
- China
- Prior art keywords
- control commands
- sql
- class
- spark
- class sql
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/242—Query formulation
- G06F16/2433—Query languages
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2452—Query translation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24568—Data stream processing; Continuous queries
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The application provides a kind of real-time streaming data processing method and processing device, and this method may include:Receive externally input type of structured query language SQL control commands;The class SQL control commands are parsed;If the class SQL control commands parsed are stream streaming control commands, subsequently received flow data is handled according to the class SQL control commands.If the class SQL control commands parsed are primary Spark SQL control commands, external structurant data specified by the class SQL control commands are handled according to the class SQL control commands.The method provided using the application can make user realize the processing to real-time streaming data by succinct class SQL control commands.
Description
Technical field
This application involves computer communication field more particularly to a kind of real-time streaming data processing method and processing devices.
Background technology
With the arrival in big data epoch, the analyzing and processing of real time data is that business processing of user etc. provides various aspects
Guidance.The business value of real time data is reduced rapidly with the loss of time, therefore after real time data generation, to real-time
Data, which rapidly process, becomes most important.
Based on the demand, the platform handled real-time streaming data is also applied and is given birth to, and substantially increases real time data
Processing speed.User can call the API of the offer of the platform when the platform is used to carry out business calculating.Although the platform
Abundant API (Application Programming Interface, application programming interface) has been provided to the user to connect
Mouthful, but each api interface has the specification of oneself, user must be familiar with grasping the use rule of each API, API could be called to come
Carry out writing for streaming computing application program.For a user, the use difficulty of these platforms is larger, therefore how to reduce use
The problem of family is persistently inquired into using the difficulty of the platform as industry.
Invention content
In view of this, the application provides a kind of real-time stream processing method and processing device, succinct pass through user
Class SQL control commands realize the processing to real-time streaming data.
Specifically, the application is achieved by the following technical solution:
According to the application's in a first aspect, providing a kind of stream data processing method, the method is applied to Spark platforms
Spark SQL components, the method includes:
Receive externally input type of structured query language SQL control commands;
The class SQL control commands are parsed;
If the class SQL control commands parsed are stream streaming control commands, controlled according to the class SQL
Order handles subsequently received flow data;
If the class SQL control commands parsed are primary Spark SQL control commands, controlled according to the class SQL
System order handles the external structurant data specified by the class SQL control commands.
Optionally, it is described the class SQL control commands are parsed after, the method further includes:
Based on the critical field obtained after being parsed to the class SQL control commands, determine that the class SQL parsed is controlled
The type of order.
Optionally, it is described based on the critical field obtained after being parsed to the class SQL control commands, determine the institute parsed
The type of class SQL control commands is stated, including:
If the critical field obtained after being parsed to the class SQL control commands matches the first default specific field, solution is determined
The class SQL control commands being precipitated are the streaming control commands;
If the critical field obtained after being parsed to the class SQL control commands matches the second default specific field, it is determined that
The class SQL control commands parsed are the primary Spark SQL control commands.
Optionally, the class SQL control commands carry data processing keyword;
It is described that subsequently received flow data is handled according to the class SQL control commands, including:
According to the data processing keyword that the class SQL control commands carry, at subsequently received flow data
Reason;
The processing of the flow data includes:Join processing, Map processing, Reduce processing and User Defined processing.
One or more of optionally, in the following way, receive externally input class SQL control commands, convection current
Data are handled:
Application programming interface API modes;
Command Line Interface CLI modes;
Java database connection mode JDBC modes.
According to the second aspect of the application, a kind of flow data processing device is provided, described device is applied to Spark platforms
Spark SQL components, described device include:
Resolution unit, for receiving externally input type of structured query language SQL control commands, and to the class SQL
Control command is parsed;
Stream SQL units, if the class SQL control commands for parsing are stream streaming control commands,
Then subsequently received flow data is handled according to the class SQL control commands;
Spark SQL units, if the class SQL control commands for parsing are primary Spark SQL control commands,
Then the external structurant data specified by the class SQL control commands are handled according to the class SQL control commands.
Optionally, the resolution unit is additionally operable to based on the keyword obtained after being parsed to the class SQL control commands
Section, determines the type of the class SQL control commands parsed.
Optionally, the resolution unit, if specifically for the critical field obtained after being parsed to the class SQL control commands
The first default specific field is matched, it is the streaming control commands to determine the class SQL control commands parsed;If
The critical field obtained after being parsed to the class SQL control commands matches the second default specific field, it is determined that the institute parsed
Class SQL control commands are stated as the primary Spark SQL control commands.
Optionally, the class SQL control commands carry data processing keyword;
The Stream SQL units, specifically for the data processing keyword carried according to the class SQL control commands,
Subsequently received flow data is handled;
The data processing keyword includes:Join processing, Map processing, Reduce processing and User Defined processing.
Optionally, the Spark SQL components one or more of in the following way, receive externally input class
SQL control commands, stream data are handled:
Application programming interface API modes;
Command Line Interface CLI modes;
Java database connection mode JDBC modes.
The application provides a kind of processing method of real-time streaming data, by the Spark SQL components in Spark platforms into
Row extension, increases Stream SQL units so that Spark SQL components can receive externally input SQL control commands, and right
The class SQL control commands are parsed.If the class SQL control commands parsed are stream streaming control commands,
Spark SQL components can then be handled subsequently received flow data according to the class SQL control commands.If it parses
The class SQL control commands are primary Spark SQL control commands, then the class SQL are controlled according to the class SQL control commands
The specified external structurant data of system order are handled.
On the one hand, since user is by using succinct class SQL control commands, it is possible to which Spark platforms are received
Real time data carries out Stream Processing, reduces user's study using the threshold of Spark platforms, greatly facilitates and simplifies user
Use to Spark platforms, so as to be greatly promoted the promotion and application of Spark platforms.
On the other hand, since Spark SQL components can automatically determine the order class of class SQL control commands input by user
Type, can be to the external knot specified by the class SQL control commands when such SQL control command is Spark SQL native commands
Structure data are handled, when such SQL control command be streaming SQL control commands when, can to real-time streaming data into
Row Stream Processing so that the interactive mode that user provides only with Spark SQL components, it is possible to said external structuring
Data and real-time streaming data both data are handled.
The third aspect, user can by the Spark SQL components CLI provided, the API that easily calls and JDBC these three
Interactive mode, input class SQL control commands, real-time streaming data is handled, so as to extend user to real-time streaming data into
The interactive mode of row processing.
Description of the drawings
Fig. 1 is a kind of schematic diagram of Spark platforms shown in one exemplary embodiment of the application;
Fig. 2 is a kind of schematic diagram of Spark SQL components shown in one exemplary embodiment of the application;
Fig. 3 is a kind of schematic diagram of real-time streaming data task processing shown in one exemplary embodiment of the application;
Fig. 4 is a kind of flow chart of real-time streaming data processing method shown in one exemplary embodiment of the application;
Fig. 5 is a kind of real-time streaming data processing unit block diagram shown in one exemplary embodiment of the application;
Fig. 6 is the hardware architecture diagram for Fig. 5 shown devices that the application provides.
Specific embodiment
Here exemplary embodiment will be illustrated in detail, example is illustrated in the accompanying drawings.Following description is related to
During attached drawing, unless otherwise indicated, the same numbers in different attached drawings represent the same or similar element.Following exemplary embodiment
Described in embodiment do not represent all embodiments consistent with the application.On the contrary, they be only with it is such as appended
The example of the consistent device and method of some aspects be described in detail in claims, the application.
It is only merely for the purpose of description specific embodiment in term used in this application, and is not intended to be limiting the application.
It is also intended in the application and " one kind " of singulative used in the attached claims, " described " and "the" including majority
Form, unless context clearly shows that other meanings.It is also understood that term "and/or" used herein refers to and wraps
Containing one or more associated list items purposes, any or all may be combined.
It will be appreciated that though various information, but this may be described using term first, second, third, etc. in the application
A little information should not necessarily be limited by these terms.These terms are only used for same type of information being distinguished from each other out.For example, not departing from
In the case of the application range, the first information can also be referred to as the second information, and similarly, the second information can also be referred to as
One information.Depending on linguistic context, word as used in this " if " can be construed to " ... when " or " when ...
When " or " in response to determining ".
Spark platforms be it is a kind of be used for realize quick and general cluster Computing Platform.Since Spark platforms have operation
The advantages such as speed is fast, ease for use is good, versatile and fault-tolerance is high so that Spark platforms carry out greatly for real time data
The fields such as data calculating are widely used.
Become more and more important with the business value of real time data, many users are carrying out big data point to real time data
During analysis, it will usually using Spark platforms.For example, the streaming computing that Spark platforms carry out some business usually can be used in user,
The visit capacity of such as counting user.
When Spark platforms is used to write streaming computing application program, user need call Spark platforms API come into
Row streaming computing application program is write.
On the one hand, although Spark platforms provide abundant API, each API has the rule that uses of oneself, use
Family must be familiar with grasping the use rule of each API, could call API to carry out writing for streaming computing application program.This is right
For user, study threshold is higher, and user must be to the relevant technologies ratio of Spark platforms and stream calculation, even bottom
In the case of more familiar, efficient streaming computing application program can be just write out.So as to greatly limit the popularization of Spark platforms
With application.
On the other hand, user can only use Spark platforms by API to write streaming applications so that user makes
Mode with Spark platforms is excessively single.
In addition, user after streaming computing application program has been write, needs streaming computing compiling of application being packaged into
Jar forms, then submit application again.Due to from streaming computing application program be programmed into streaming computing application issued need through
The cumbersome process of above-mentioned complexity is gone through, thus greatly reduces the service efficiency of Spark platforms.
In view of this, the application provides a kind of processing method of real-time streaming data, by the Spark in Spark platforms
SQL components are extended, and increase Stream SQL units so that Spark SQL components can receive externally input class formation
Change query language SQL control commands, and the class SQL control commands are parsed.If the class SQL control lives parsed
It enables to flow streaming control commands, Spark SQL components then can be according to the class SQL control commands to subsequently received
Flow data is handled.If the class SQL control commands parsed are primary Spark SQL control commands, according to described in
Class SQL control commands handle the external structurant data specified by the class SQL control commands.
On the one hand, since user is by using succinct class SQL control commands, it is possible to which Spark platforms are received
Real time data carries out Stream Processing, reduces user's study using the threshold of Spark platforms, greatly facilitates and simplifies user
Use to Spark platforms, so as to be greatly promoted the promotion and application of Spark platforms.
On the other hand, since Spark SQL components can automatically determine the order class of class SQL control commands input by user
Type, can be to the external knot specified by the class SQL control commands when such SQL control command is Spark SQL native commands
Structure data are handled, when such SQL control command be streaming SQL control commands when, can to real-time streaming data into
Row Stream Processing so that the interactive mode that user provides only with Spark SQL components, it is possible to said external structuring
Data and real-time streaming data both data are handled.
The third aspect, user can pass through CLI (Command-Line Interface, the life of the offer of Spark SQL components
Enable row interface), the API that easily calls and JDBC (connection of Java DataBase Connectivity, java database) this three
Kind interactive mode, inputs class SQL control commands, real-time streaming data is handled, so as to extend user to real-time streaming data
The interactive mode handled.
Referring to Fig. 1, Fig. 1 is a kind of schematic diagram of Spark platforms shown in one exemplary embodiment of the application;Spark is put down
Platform generally includes:Spark SQL components, Spark Streaming components, MLbase/MLlib (Machine Learning, machine
Device learns) component and GraphX (graph) component.
Wherein, above-mentioned Spark SQL components are the frame sets that can be used to operate structural data that Spark platforms provide
Part.By Spark SQL components, user can using SQL statement come inquire or read external structurant data (such as JASON,
Data in Hive, Parquet etc.).In addition, user can also be connected to Spark by third party's intelligence software by JDBC
SQL is inquired.
Above-mentioned Spark Streaming components, which are one, carries out real-time streaming data the Stream Processing that height is handled up, height is fault-tolerant
System can carry out multiple data sources (such as Kafka, Flume, Twitter) the complicated behaviour such as similar Map, Reduce and Join
Make, and result is saved in foreign file system.
Above-mentioned MLbase components are the components being absorbed in Spark platforms for machine learning, include in the component common
Machine learning algorithm and using program, including classifying, returning, clustering, collaborative filtering etc..
Above-mentioned GraphX components can be the API for figure and figure parallel computation in Spark platforms.
The processing method of real-time streaming data provided herein is to carry out Function Extension to above-mentioned Spark SQL components,
For example increase the function of the resolution unit in Spark SQL components, increase Stream SQL units in Spark SQL components,
So as to increase the function of being handled real-time streaming data so that user can be original to platform in realization by class SQL statement
External structurant data processing processing while, also the real-time streaming data received is handled.
The Spark SQL components provided below the application are described in detail.
Referring to Fig. 2, Fig. 2 is a kind of schematic diagram of Spark SQL components shown in one exemplary embodiment of the application.
The Spark SQL components provided in the embodiment of the present application may include:Resolution unit, Spark SQL units and
Stream SQL units.
Wherein, above-mentioned resolution unit, for after class SQL control commands input by user are received, being controlled to the class SQL
System order is parsed.Such SQL control command type is determined by parsing obtained critical field.If such SQL is controlled
It orders as primary Spark SQL control commands, then such SQL control command is sent to Spark SQL units.If such SQL
Control command is streaming control commands, then such SQL control command is sent to Stream SQL units.
Above-mentioned Spark SQL units are used for operating the core cell of structural data in SparkSQL components.User can
The data of external structured data sources (such as JSON, Hive, Parquet) are inquired or read using SQL statement;While it is not
It is only supported inside Spark programs and carries out data query using SQL statement, third-party intelligence software is also supported to connect by JDBC
Spark SQL are connected to be inquired.Handle above-mentioned Stream SQL units, for from data sources to real-time streaming data
It is handled, but is that Spark SQL units are that external structural data is handled with the difference of Spark SQL units, and
Stream SQL be to real-time reception to flow data handle.Handling real-time streaming data here, it is possible to understand that
For the business demand according to user, flow-type business calculating etc. is carried out to real-time streaming data.Certainly, here only to real-time streaming data
It carries out processing to illustrate, it not carried out specifically defined.
More specifically, as shown in figure 3, Fig. 3 is a kind of real-time streaming data shown in one exemplary embodiment of the application
The schematic diagram of task processing.
From figure 3, it can be seen that Stream SQL units further comprise receiving module, Stream task processing modules and hair
Send module.
Wherein, receiving module is responsible for receiving the data flow in data source, which can be that Kafka is (a kind of high to handle up
The distributed post of amount subscribes to message system), Flume (High Availabitity that Cloudera is provided, it is highly reliable, it is distributed
Massive logs acquisition, polymerization and transmission system) etc. multiple data sources.The receiving module can be the Spark platforms
Receiving module in Spark Streaming components, can also be it is newly developed go out receiving module, do not have here to it
Limit to body.
Stream task processing modules, available for crucial according to the data processing in class SQL control commands input by user
Word carries out task processing to the real-time streaming data received.
Sending module, available for receiving Stream task processing modules task treated data write-in external storage tool
In, which can directly be exchanged with external storage tool, then by real-time streaming data task handle as a result, according to
Meet the form of interfaced storage tool, format conversion is carried out to handling result, is then store in the storage tool.
The processing method of the real-time streaming data proposed below to the application is described in detail.
Referring to Fig. 4, Fig. 4 is a kind of flow of real-time streaming data processing method shown in one exemplary embodiment of the application
Figure.This method can be used for the Spark SQL components of Spark platforms.This method may include step as follows.
Step 401:Receive externally input type of structured query language SQL control commands;
Step 402:The class SQL control commands are parsed;
Step 403:If the class SQL control commands parsed are streaming control commands, according to the class
SQL control commands handle subsequently received flow data;
Step 404:If the class SQL control commands parsed are primary Spark SQL control commands, according to described in
Class SQL control commands handle the external structurant data specified by the class SQL control commands.
Wherein, above-mentioned flow data, it can be understood as from data source real-time reception to flow data, so-called flow data can be with
It is interpreted as the data that data source is continuously generated, such as the journal file that user is generated in real time using application program, net purchase data etc.
Deng.The flow data is usually also referred to as real-time streaming data.
Data processing keyword is carried in above-mentioned class SQL control commands, is divided from type, can be included at two class data
Keyword is managed, is closed respectively for the data processing keyword of real-time streaming data and for the data processing of external structurant data
Key word.When above-mentioned class SQL control commands are Streaming control commands, the data processing that such SQL control command carries is closed
Key word can be the data processing keyword for real-time streaming data, and the data processing indicated by the data processing keyword can wrap
Include Join processing, Map processing, Reduce processing and some User Defined processing modes.Certainly can also include in SQL statement
Some common calculating tasks, for example sum, screening etc..When above-mentioned class SQL control commands are primary Spark SQL control lives
When enabling, the data processing operation indicated by data processing keyword that is carried in such SQL control command can include original
The data processing operation that Spark SQL can be performed.
In the embodiment of the present application, the interactive mode identical with Spark SQL components can be used in user, inputs above-mentioned class
SQL control commands.
For example, user can be by tri- kinds of modes of API, CLI and JDBC, to Spark SQL components input class SQL control lives
It enables.
It should be noted that for the API, provided with the Spark Streaming components in Spark platforms
API is entirely different, and the api interface that Spark Streaming are provided needs user to understand the api interface in detail, can just write multiple
Miscellaneous program is called into line interface.And in this application, user can input general succinct class SQL control lives by API
It enables, real-time streaming data is handled.
In the embodiment of the present application, Spark SQL components can receive class SQL control commands input by user.And to this
SQL control commands are parsed.
After being parsed to such SQL control command, some critical fielies are can obtain.
In an optional implementation manner, usual Spark SQL components are preset with related to Streaming control commands
Some specific fields, as the first specific field, for example, some fields such as STREAM.Spark SQL components are also preset with
With some relevant specific fields of primary Spark SQL control commands, as the second specific field, such as some words such as table
Section.
It should be noted that above-mentioned first specific field can be understood as and preset and Streaming control command phases
The set for some specific fields closed.Above-mentioned second specific field, which can be understood as preset controlled with primary Spark SQL, orders
Enable the set of some relevant specific fields.
Spark SQL components can based on obtained critical field after being parsed to SQL control commands, determine to parse such
The type of SQL control commands.If parsing obtained above-mentioned critical field matches above-mentioned first specific field, Spark SQL components
Can determine the class SQL control commands parsed is Streaming control commands;If parse obtained above-mentioned critical field matching
Above-mentioned second specific field, Spark SQL components can determine that the class SQL control commands parsed are controlled for primary Spark SQL
Order.
For example, it is assumed that class SQL control commands input by user are:
“CREAT TABLE tableName(NAME STRING,AGE INT);
Show tables;”
Wherein, CREAT TABLE is create table order, and tableName is table name, and (NAME STRING, AGE INT) is
The data type of definition, Show tables are display table order.
Spark SQL components can obtain some critical fielies after being parsed to such control command, such as CREAT TABLE,
Show tables etc..By parsing obtained critical field, Spark SQL components find that these critical fielies are specified with second
Field (fields such as such as table, show tables) matches, and Spark SQL components can determine the class SQL control commands parsed
For primary Spark SQL control commands.
For another example, it is assumed that class SQL control commands input by user are:
Wherein, CREATE STREAM are the order for creating Streaming streams;StreamTable is what is created
The title of Streaming streams;(NAME STRING, AGE INT) is the data type of definition;TBLPROPERTIES is represented with key
Value is to form storage configuration information;SOURCES is data source to specify socket;HOSTNAME is for the socket IP received
Location;SINKS is hive to specify data storage format;Insert is to start the order of Streaming streams created, SELECT
For querying command.
Spark SQL components can obtain some critical fielies, such as CREATE after being parsed to such control command
STREAM, StreamTable etc..By parsing obtained critical field, Spark SQL components find these critical fielies and the
One specific field (some fields of such as STREAM) matches, and Spark SQL components can determine such SQL control command parsed
For streaming control commands.
It is, of course, also possible to the command type of above-mentioned class SQL control commands is determined using other methods, here only to true
The method for determining the command type of class SQL control commands is illustratively illustrated, it is not carried out specifically defined.
In the embodiment of the present application, usual class SQL control commands carry data processing keyword, are determining user's input
Class SQL control commands after, Spark SQL components after being parsed to such SQL control command, can obtain such SQL control
The data processing keyword that system order carries.Spark SQL components can be according to these data processing keywords, to real-time streaming data
It is handled.
Wherein, above-mentioned data processing keyword can include CREAT (establishment), and insert (startup), join is (at Join
Reason), group by (Reduce processing) etc..
For example, it is assumed that class SQL control commands input by user are:
“CREAT TABLE tableName(NAME STRING,AGE INT);
Show tables;”
Wherein, CREAT TABLE are the order for creating table;TableName is table name;(NAME STRING,AGE INT)
Data type for definition.
Spark SQL components can parse the control command after such SQL control command is received, after parsing,
Spark SQL components can obtain such SQL control command carrying data processing keyword, such as CREAT, Show tables etc.,
And to the processing operation corresponding to such SQL control command it is establishment table and shows the operation of table being created that.
For another example assume that class SQL control commands input by user are:
Wherein, CREATE STREAM are the order for creating Streaming streams;W is the title of Streaming streams created;
(NAME STRING, AGE INT) is the data type of definition;TBLPROPERTIES represents that storage configuration is believed in the form of key-value pair
Breath;SOURCES is data source to specify socket;HOSTNAME is the IP address that socket is received;SINKS is specifies number
It is hive according to storage form;Insert is to start the order of Streaming streams created;Group represents real-time to what is received
Flow data is grouped processing.
Spark SQL components can parse the control command after such SQL control command is received.Pass through solution
Analysis, Spark SQL components can obtain the data processing keyword carried in such SQL control command, as CREAT, insert,
Group by etc., and can processing keyword be got based on this, determine such corresponding processing operation of SQL control commands to create
Streaming flows, and starts streaming streams and complete to handle the classified statistic of real-time streaming data received.
In the embodiment of the present application, when above-mentioned class SQL control commands is flow streaming control commands, Spark SQL groups
Part can be handled the real-time streaming data received according to the data processing keyword got.
Still by it is above-mentioned the example of processing is grouped to real-time stream for, to such SQL control command parse after,
Spark SQL components can start above-mentioned processing procedure.In this example, Spark SQL components can be according to above-mentioned data processing key
Word carries out above-mentioned classified statistic processing operation to the real-time stream received, then will be in handling result and processing procedure
The intermediate data (batch generated in such as real-time streaming data processing procedure) of generation is stored in such SQL control command and specifies
Storage tool in, in Hive, for user inquire.
In the embodiment of the present application, when the class SQL control commands are primary Spark SQL control commands, then based on obtaining
The data processing keyword got handles external structurant data specified by the class SQL control commands, and stores institute
State processing as a result, so that user inquires.
In addition, in the embodiment of the present application, the processing of above-mentioned flow data may include Join processing, Map processing, at Reduce
Reason and User Defined processing.
For Join tasks, following example can be lifted.
Class SQL control commands input by user are:
Spark SQL components can then obtain data processing keyword in parsing, such as CREAT STREAM w1, CREATE
STREAM w2, insert, join etc..Spark SQL components can determine to be performed from the data processing keyword got
Processing be to create two Streaming stream, title is respectively w1 and w2.After by w1 and w2 connect entirely, from result set
Middle selection age identical record.
For Map tasks, following example can be lifted.
Class SQL control commands input by user are:
Spark SQL components can obtain the number of such SQL control command carrying after being parsed to such SQL control command
According to processing keyword, such as CREATE STREAM w, insert, age+100.Spark SQL components can be somebody's turn to do from what is got
Data processing keyword determines that the processing to be performed is to create a Streaming streams, entitled w.The corresponding real-time streams to w
Data add 100.
For Reduce tasks, following example can be lifted.
Class SQL control commands input by user are:
Spark SQL components can then obtain the data processing keyword of such SQL control command carrying in parsing, such as
CREATE STREAM w, insert, group by etc..Spark SQL components can be true from the data processing keyword got
The fixed processing to be performed includes creating a Streaming streams, and entitled w, to w, corresponding real-time streaming data is according to age's
Value is grouped statistical operation.
In addition, the Spark platforms that the application provides are also to provide custom function operation.
For example, user wants whether realize through Age estimation is young man's demand.
User can write custom function in advance, such as:
Then, user can input class SQL as follows when the custom function is used to carry out real-time streaming data processing
Control command is.
add jar youngPeople.jar;
create temporary function isYoung as'com.spark.YoungPeople';
insert into stream console select name,isYoung(age)from w;
Assuming that the real-time streaming data received is the name and age data of user, Spark platforms can obtain after parsing
Data processing keyword, such as create temporary function, insert, select name, isYoung (age) etc..
Spark SQL components can determine that the processing to be performed is included according to judgement from the data processing keyword got
YoungPeople functions handle the corresponding real-time streams of the w of establishment, i.e., are according to the Age estimation user of user
No is young man.
The application provides a kind of processing method of real-time streaming data, by the Spark SQL components in Spark platforms into
Row extension, increases Stream SQL units so that Spark SQL components can receive externally input SQL control commands, and right
The class SQL control commands are parsed.If the class SQL control commands parsed are streaming control commands,
Spark SQL components can then be handled subsequently received flow data according to the class SQL control commands.If it parses
The class SQL control commands are primary Spark SQL control commands, then the class SQL are controlled according to the class SQL control commands
The specified external structurant data of system order are handled.
On the one hand, since user is by using succinct class SQL control commands, it is possible to which Spark platforms are received
Real time data carries out Stream Processing, reduces user's study using the threshold of Spark platforms, greatly facilitates and simplifies user
Use to Spark platforms, so as to be greatly promoted the promotion and application of Spark platforms.
On the other hand, since Spark SQL components can automatically determine the order class of class SQL control commands input by user
Type, can be to the external knot specified by the class SQL control commands when such SQL control command is Spark SQL native commands
Structure data are handled, when such SQL control command be streaming SQL control commands when, can to real-time streaming data into
Row Stream Processing so that the interactive mode that user provides only with Spark SQL components, it is possible to said external structuring
Data and real-time streaming data both data are handled.
The third aspect, user can by the Spark SQL components CLI provided, the API that easily calls and JDBC these three
Interactive mode, input class SQL control commands, real-time streaming data is handled, so as to extend user to real-time streaming data into
The interactive mode of row processing.
Referring to Fig. 5, Fig. 5 is a kind of real-time streaming data processing unit block diagram shown in one exemplary embodiment of the application.Institute
The Spark SQL components that device is applied to Spark platforms are stated, described device includes:
Resolution unit 501, for receiving externally input type of structured query language SQL control commands, and to the class
SQL control commands are parsed;
Stream SQL units 502, if the class SQL control commands for parsing are stream streaming control lives
It enables, then subsequently received flow data is handled according to the class SQL control commands.
Spark SQL units 503, if the class SQL control commands for parsing are primary Spark SQL control lives
It enables, then the external structurant data specified by the class SQL control commands is handled according to the class SQL control commands.
Optionally, the resolution unit 501 is additionally operable to based on the key obtained after being parsed to the class SQL control commands
Field determines the type of the class SQL control commands parsed.
Optionally, the resolution unit 501, if matching the first default specific field specifically for the critical field, really
Surely the class SQL control commands parsed are the streaming control commands;If the critical field matching second is default
Specific field, it is determined that the class SQL control commands parsed are the primary Spark SQL control commands.
Optionally, the class SQL control commands carry data processing keyword;
The Stream SQL units 502, specifically for the data processing key carried according to the class SQL control commands
Word handles subsequently received flow data;
The data processing keyword includes:Join processing, Map processing, Reduce processing and User Defined processing.
Optionally, in the following way one or more of, the Spark SQL components in the following way in
One or more, receive externally input class SQL control commands, and stream data is handled:
Application programming interface API modes;
Command Line Interface CLI modes;
Java database connection mode JDBC modes.
Accordingly, present invention also provides the hardware configurations of Fig. 5 shown devices.Referring to Fig. 6, Fig. 6 is what the application provided
The hardware architecture diagram of Fig. 5 shown devices, the system include:Processor 601, memory 602 and bus 603;Wherein, it handles
Device 601 and memory 602 complete mutual communication by bus 603.
Wherein, processor 601 can be a CPU, and memory 602 can be nonvolatile memory (non-
Volatile memory), and metadata management logic instruction is stored in memory 602, processor 601 can be performed and be deposited
The metadata management logic instruction stored in reservoir 602, it is shown in Figure 3 to realize the management function of above-mentioned real-time streaming data
Flow.
As shown in fig. 6, the hardware configuration can also include the power management that a power supply module is configured as executive device
604 and input and output (I/O) interface 605.
For device embodiment, since it corresponds essentially to embodiment of the method, so related part is referring to method reality
Apply the part explanation of example.The apparatus embodiments described above are merely exemplary, wherein described be used as separating component
The unit of explanation may or may not be physically separate, and the component shown as unit can be or can also
It is not physical unit, you can be located at a place or can also be distributed in multiple network element.It can be according to reality
It needs that some or all of module therein is selected to realize the purpose of application scheme.Those of ordinary skill in the art are not paying
In the case of going out creative work, you can to understand and implement.
The foregoing is merely the preferred embodiment of the application, not limiting the application, all essences in the application
God and any modification, equivalent substitution, improvement and etc. within principle, done, should be included within the scope of the application protection.
Claims (10)
1. a kind of stream data processing method, which is characterized in that the method is applied to the Spark SQL components of Spark platforms, institute
The method of stating includes:
Receive externally input type of structured query language SQL control commands;
The class SQL control commands are parsed;
If the class SQL control commands parsed are stream streaming control commands, according to the class SQL control commands
Subsequently received flow data is handled;
If the class SQL control commands parsed are primary Spark SQL control commands, control and order according to the class SQL
Order handles the external structurant data specified by the class SQL control commands.
2. according to the method described in claim 1, it is characterized in that, the class SQL control commands are carried out parsing it described
Afterwards, the method further includes:
Based on the critical field obtained after being parsed to the class SQL control commands, the class SQL control commands parsed are determined
Type.
3. according to the method described in claim 2, it is characterized in that, described be based on obtaining after parsing the class SQL control commands
The critical field arrived determines the type of the class SQL control commands parsed, including:
If the critical field obtained after parsing the class SQL control commands matches the first default specific field, determine to parse
The class SQL control commands be the streaming control commands;
If the critical field obtained after being parsed to the class SQL control commands matches the second default specific field, it is determined that parsing
The class SQL control commands gone out are the primary Spark SQL control commands.
4. according to the method described in claim 1, it is characterized in that, the class SQL control commands carry data processing key
Word;
It is described that subsequently received flow data is handled according to the class SQL control commands, including:
According to the data processing keyword that the class SQL control commands carry, subsequently received flow data is handled;
The processing of the flow data includes:Join processing, Map processing, Reduce processing and User Defined processing.
5. it according to the method described in claim 1, it is characterized in that, one or more of in the following way, receives outer
The class SQL control commands of portion's input, stream data are handled:
Application programming interface API modes;
Command Line Interface CLI modes;
Java database connection mode JDBC modes.
6. a kind of flow data processing device, which is characterized in that described device is applied to the Spark SQL components of Spark platforms, institute
Device is stated to include:
Resolution unit for receiving externally input type of structured query language SQL control commands, and controls the class SQL
Order is parsed;
Stream SQL units, if the class SQL control commands for parsing are stream streaming control commands, according to
Subsequently received flow data is handled according to the class SQL control commands;
Spark SQL units, if the class SQL control commands for parsing are primary Spark SQL control commands, according to
The external structurant data specified by the class SQL control commands are handled according to the class SQL control commands.
7. device according to claim 6, which is characterized in that the resolution unit is additionally operable to be based on controlling the class SQL
The critical field obtained after command analysis processed determines the type of the class SQL control commands parsed.
8. device according to claim 7, which is characterized in that the resolution unit, if specifically for being controlled to the class SQL
The critical field obtained after command analysis processed matches the first default specific field, determines the class SQL control commands parsed
For the streaming control commands;If the critical field matching second obtained after being parsed to the class SQL control commands is default
Specific field, it is determined that the class SQL control commands parsed are the primary Spark SQL control commands.
9. device according to claim 6, which is characterized in that the class SQL control commands carry data processing key
Word;
The Stream SQL units, specifically for the data processing keyword carried according to the class SQL control commands, to rear
The flow data that continued access receives is handled;
The data processing keyword includes:Join processing, Map processing, Reduce processing and User Defined processing.
10. device according to claim 6, which is characterized in that the Spark SQL components in the following way in one
Kind is a variety of, receives externally input class SQL control commands, stream data is handled:
Application programming interface API modes;
Command Line Interface CLI modes;
Java database connection mode JDBC modes.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710773501.7A CN108255913A (en) | 2017-08-31 | 2017-08-31 | A kind of real-time streaming data processing method and processing device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710773501.7A CN108255913A (en) | 2017-08-31 | 2017-08-31 | A kind of real-time streaming data processing method and processing device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108255913A true CN108255913A (en) | 2018-07-06 |
Family
ID=62721978
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710773501.7A Pending CN108255913A (en) | 2017-08-31 | 2017-08-31 | A kind of real-time streaming data processing method and processing device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108255913A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111262915A (en) * | 2020-01-10 | 2020-06-09 | 北京东方金信科技有限公司 | Kafka cluster-crossing data conversion system and method |
CN111651148A (en) * | 2020-06-09 | 2020-09-11 | 北京思特奇信息技术股份有限公司 | Dynamic generation method and system of Stream SQL |
CN113064910A (en) * | 2021-03-18 | 2021-07-02 | 西南科技大学 | Reaction type pneumatic data multidimensional analysis platform |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103164270A (en) * | 2011-12-12 | 2013-06-19 | 阿里巴巴集团控股有限公司 | Java system application programming interface calling method and system using the same |
CN104885077A (en) * | 2012-09-28 | 2015-09-02 | 甲骨文国际公司 | Managing continuous queries with archived relations |
CN105868019A (en) * | 2016-02-01 | 2016-08-17 | 中国科学院大学 | Automatic optimization method for performance of Spark platform |
CN106844546A (en) * | 2016-12-30 | 2017-06-13 | 江苏号百信息服务有限公司 | Multi-data source positional information fusion method and system based on Spark clusters |
-
2017
- 2017-08-31 CN CN201710773501.7A patent/CN108255913A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103164270A (en) * | 2011-12-12 | 2013-06-19 | 阿里巴巴集团控股有限公司 | Java system application programming interface calling method and system using the same |
CN104885077A (en) * | 2012-09-28 | 2015-09-02 | 甲骨文国际公司 | Managing continuous queries with archived relations |
CN105868019A (en) * | 2016-02-01 | 2016-08-17 | 中国科学院大学 | Automatic optimization method for performance of Spark platform |
CN106844546A (en) * | 2016-12-30 | 2017-06-13 | 江苏号百信息服务有限公司 | Multi-data source positional information fusion method and system based on Spark clusters |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111262915A (en) * | 2020-01-10 | 2020-06-09 | 北京东方金信科技有限公司 | Kafka cluster-crossing data conversion system and method |
CN111262915B (en) * | 2020-01-10 | 2020-09-22 | 北京东方金信科技有限公司 | Kafka cluster-crossing data conversion system and method |
CN111651148A (en) * | 2020-06-09 | 2020-09-11 | 北京思特奇信息技术股份有限公司 | Dynamic generation method and system of Stream SQL |
CN113064910A (en) * | 2021-03-18 | 2021-07-02 | 西南科技大学 | Reaction type pneumatic data multidimensional analysis platform |
CN113064910B (en) * | 2021-03-18 | 2022-03-08 | 西南科技大学 | Reaction type pneumatic data multidimensional analysis platform |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11568305B2 (en) | System and method for customer journey event representation learning and outcome prediction using neural sequence models | |
CN102298607B (en) | The schema contract of data integration | |
US9092491B2 (en) | Searching documentation across interconnected nodes in a distributed network | |
US20100037157A1 (en) | Proactive machine-aided mashup construction with implicit and explicit input from user community | |
WO2017166944A1 (en) | Method and device for providing service access | |
CN107943945A (en) | Isomery operator management method in a kind of big data analysis development platform | |
JP2010531481A (en) | Transfer of tabular parameters via tabular data stream protocol | |
US20140101635A1 (en) | Automated generation of two-tier mobile applications | |
US10311364B2 (en) | Predictive intelligence for service and support | |
US20230108560A1 (en) | Methods and Systems for Representation, Composition and Execution of Artificial Intelligence Centric Applications | |
CN1956454B (en) | Method and system for bundling and sending work units to a server based on a weighted cost | |
US11567735B1 (en) | Systems and methods for integration of multiple programming languages within a pipelined search query | |
CN108255913A (en) | A kind of real-time streaming data processing method and processing device | |
WO2024016547A1 (en) | Data query method and device based on multi-party collaboration | |
US20220350812A1 (en) | Data distribution process configuration method and apparatus, electronic device and storage medium | |
US20240004853A1 (en) | Virtual data source manager of data virtualization-based architecture | |
CN110781180A (en) | Data screening method and data screening device | |
CN109408658A (en) | Expression picture reminding method, device, computer equipment and storage medium | |
US9229980B2 (en) | Composition model for cloud-hosted serving applications | |
CN112015382B (en) | Processor architecture analysis method, device, equipment and storage medium | |
US11720942B1 (en) | Interactive retrieval using visual semantic matching | |
US11960616B2 (en) | Virtual data sources of data virtualization-based architecture | |
US20210373916A1 (en) | Software plugins of data virtualization-based architecture | |
US20100094893A1 (en) | Query interface configured to invoke an analysis routine on a parallel computing system as part of database query processing | |
CN109951376B (en) | Instant messaging software information acquisition method, device, system and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180706 |
|
RJ01 | Rejection of invention patent application after publication |