CN108255913A - A kind of real-time streaming data processing method and processing device - Google Patents

A kind of real-time streaming data processing method and processing device Download PDF

Info

Publication number
CN108255913A
CN108255913A CN201710773501.7A CN201710773501A CN108255913A CN 108255913 A CN108255913 A CN 108255913A CN 201710773501 A CN201710773501 A CN 201710773501A CN 108255913 A CN108255913 A CN 108255913A
Authority
CN
China
Prior art keywords
control commands
sql
class
spark
class sql
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710773501.7A
Other languages
Chinese (zh)
Inventor
胡良文
丁远普
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
New H3C Big Data Technologies Co Ltd
Original Assignee
New H3C Big Data Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by New H3C Big Data Technologies Co Ltd filed Critical New H3C Big Data Technologies Co Ltd
Priority to CN201710773501.7A priority Critical patent/CN108255913A/en
Publication of CN108255913A publication Critical patent/CN108255913A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2452Query translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24568Data stream processing; Continuous queries

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a kind of real-time streaming data processing method and processing device, and this method may include:Receive externally input type of structured query language SQL control commands;The class SQL control commands are parsed;If the class SQL control commands parsed are stream streaming control commands, subsequently received flow data is handled according to the class SQL control commands.If the class SQL control commands parsed are primary Spark SQL control commands, external structurant data specified by the class SQL control commands are handled according to the class SQL control commands.The method provided using the application can make user realize the processing to real-time streaming data by succinct class SQL control commands.

Description

A kind of real-time streaming data processing method and processing device
Technical field
This application involves computer communication field more particularly to a kind of real-time streaming data processing method and processing devices.
Background technology
With the arrival in big data epoch, the analyzing and processing of real time data is that business processing of user etc. provides various aspects Guidance.The business value of real time data is reduced rapidly with the loss of time, therefore after real time data generation, to real-time Data, which rapidly process, becomes most important.
Based on the demand, the platform handled real-time streaming data is also applied and is given birth to, and substantially increases real time data Processing speed.User can call the API of the offer of the platform when the platform is used to carry out business calculating.Although the platform Abundant API (Application Programming Interface, application programming interface) has been provided to the user to connect Mouthful, but each api interface has the specification of oneself, user must be familiar with grasping the use rule of each API, API could be called to come Carry out writing for streaming computing application program.For a user, the use difficulty of these platforms is larger, therefore how to reduce use The problem of family is persistently inquired into using the difficulty of the platform as industry.
Invention content
In view of this, the application provides a kind of real-time stream processing method and processing device, succinct pass through user Class SQL control commands realize the processing to real-time streaming data.
Specifically, the application is achieved by the following technical solution:
According to the application's in a first aspect, providing a kind of stream data processing method, the method is applied to Spark platforms Spark SQL components, the method includes:
Receive externally input type of structured query language SQL control commands;
The class SQL control commands are parsed;
If the class SQL control commands parsed are stream streaming control commands, controlled according to the class SQL Order handles subsequently received flow data;
If the class SQL control commands parsed are primary Spark SQL control commands, controlled according to the class SQL System order handles the external structurant data specified by the class SQL control commands.
Optionally, it is described the class SQL control commands are parsed after, the method further includes:
Based on the critical field obtained after being parsed to the class SQL control commands, determine that the class SQL parsed is controlled The type of order.
Optionally, it is described based on the critical field obtained after being parsed to the class SQL control commands, determine the institute parsed The type of class SQL control commands is stated, including:
If the critical field obtained after being parsed to the class SQL control commands matches the first default specific field, solution is determined The class SQL control commands being precipitated are the streaming control commands;
If the critical field obtained after being parsed to the class SQL control commands matches the second default specific field, it is determined that The class SQL control commands parsed are the primary Spark SQL control commands.
Optionally, the class SQL control commands carry data processing keyword;
It is described that subsequently received flow data is handled according to the class SQL control commands, including:
According to the data processing keyword that the class SQL control commands carry, at subsequently received flow data Reason;
The processing of the flow data includes:Join processing, Map processing, Reduce processing and User Defined processing.
One or more of optionally, in the following way, receive externally input class SQL control commands, convection current Data are handled:
Application programming interface API modes;
Command Line Interface CLI modes;
Java database connection mode JDBC modes.
According to the second aspect of the application, a kind of flow data processing device is provided, described device is applied to Spark platforms Spark SQL components, described device include:
Resolution unit, for receiving externally input type of structured query language SQL control commands, and to the class SQL Control command is parsed;
Stream SQL units, if the class SQL control commands for parsing are stream streaming control commands, Then subsequently received flow data is handled according to the class SQL control commands;
Spark SQL units, if the class SQL control commands for parsing are primary Spark SQL control commands, Then the external structurant data specified by the class SQL control commands are handled according to the class SQL control commands.
Optionally, the resolution unit is additionally operable to based on the keyword obtained after being parsed to the class SQL control commands Section, determines the type of the class SQL control commands parsed.
Optionally, the resolution unit, if specifically for the critical field obtained after being parsed to the class SQL control commands The first default specific field is matched, it is the streaming control commands to determine the class SQL control commands parsed;If The critical field obtained after being parsed to the class SQL control commands matches the second default specific field, it is determined that the institute parsed Class SQL control commands are stated as the primary Spark SQL control commands.
Optionally, the class SQL control commands carry data processing keyword;
The Stream SQL units, specifically for the data processing keyword carried according to the class SQL control commands, Subsequently received flow data is handled;
The data processing keyword includes:Join processing, Map processing, Reduce processing and User Defined processing.
Optionally, the Spark SQL components one or more of in the following way, receive externally input class SQL control commands, stream data are handled:
Application programming interface API modes;
Command Line Interface CLI modes;
Java database connection mode JDBC modes.
The application provides a kind of processing method of real-time streaming data, by the Spark SQL components in Spark platforms into Row extension, increases Stream SQL units so that Spark SQL components can receive externally input SQL control commands, and right The class SQL control commands are parsed.If the class SQL control commands parsed are stream streaming control commands, Spark SQL components can then be handled subsequently received flow data according to the class SQL control commands.If it parses The class SQL control commands are primary Spark SQL control commands, then the class SQL are controlled according to the class SQL control commands The specified external structurant data of system order are handled.
On the one hand, since user is by using succinct class SQL control commands, it is possible to which Spark platforms are received Real time data carries out Stream Processing, reduces user's study using the threshold of Spark platforms, greatly facilitates and simplifies user Use to Spark platforms, so as to be greatly promoted the promotion and application of Spark platforms.
On the other hand, since Spark SQL components can automatically determine the order class of class SQL control commands input by user Type, can be to the external knot specified by the class SQL control commands when such SQL control command is Spark SQL native commands Structure data are handled, when such SQL control command be streaming SQL control commands when, can to real-time streaming data into Row Stream Processing so that the interactive mode that user provides only with Spark SQL components, it is possible to said external structuring Data and real-time streaming data both data are handled.
The third aspect, user can by the Spark SQL components CLI provided, the API that easily calls and JDBC these three Interactive mode, input class SQL control commands, real-time streaming data is handled, so as to extend user to real-time streaming data into The interactive mode of row processing.
Description of the drawings
Fig. 1 is a kind of schematic diagram of Spark platforms shown in one exemplary embodiment of the application;
Fig. 2 is a kind of schematic diagram of Spark SQL components shown in one exemplary embodiment of the application;
Fig. 3 is a kind of schematic diagram of real-time streaming data task processing shown in one exemplary embodiment of the application;
Fig. 4 is a kind of flow chart of real-time streaming data processing method shown in one exemplary embodiment of the application;
Fig. 5 is a kind of real-time streaming data processing unit block diagram shown in one exemplary embodiment of the application;
Fig. 6 is the hardware architecture diagram for Fig. 5 shown devices that the application provides.
Specific embodiment
Here exemplary embodiment will be illustrated in detail, example is illustrated in the accompanying drawings.Following description is related to During attached drawing, unless otherwise indicated, the same numbers in different attached drawings represent the same or similar element.Following exemplary embodiment Described in embodiment do not represent all embodiments consistent with the application.On the contrary, they be only with it is such as appended The example of the consistent device and method of some aspects be described in detail in claims, the application.
It is only merely for the purpose of description specific embodiment in term used in this application, and is not intended to be limiting the application. It is also intended in the application and " one kind " of singulative used in the attached claims, " described " and "the" including majority Form, unless context clearly shows that other meanings.It is also understood that term "and/or" used herein refers to and wraps Containing one or more associated list items purposes, any or all may be combined.
It will be appreciated that though various information, but this may be described using term first, second, third, etc. in the application A little information should not necessarily be limited by these terms.These terms are only used for same type of information being distinguished from each other out.For example, not departing from In the case of the application range, the first information can also be referred to as the second information, and similarly, the second information can also be referred to as One information.Depending on linguistic context, word as used in this " if " can be construed to " ... when " or " when ... When " or " in response to determining ".
Spark platforms be it is a kind of be used for realize quick and general cluster Computing Platform.Since Spark platforms have operation The advantages such as speed is fast, ease for use is good, versatile and fault-tolerance is high so that Spark platforms carry out greatly for real time data The fields such as data calculating are widely used.
Become more and more important with the business value of real time data, many users are carrying out big data point to real time data During analysis, it will usually using Spark platforms.For example, the streaming computing that Spark platforms carry out some business usually can be used in user, The visit capacity of such as counting user.
When Spark platforms is used to write streaming computing application program, user need call Spark platforms API come into Row streaming computing application program is write.
On the one hand, although Spark platforms provide abundant API, each API has the rule that uses of oneself, use Family must be familiar with grasping the use rule of each API, could call API to carry out writing for streaming computing application program.This is right For user, study threshold is higher, and user must be to the relevant technologies ratio of Spark platforms and stream calculation, even bottom In the case of more familiar, efficient streaming computing application program can be just write out.So as to greatly limit the popularization of Spark platforms With application.
On the other hand, user can only use Spark platforms by API to write streaming applications so that user makes Mode with Spark platforms is excessively single.
In addition, user after streaming computing application program has been write, needs streaming computing compiling of application being packaged into Jar forms, then submit application again.Due to from streaming computing application program be programmed into streaming computing application issued need through The cumbersome process of above-mentioned complexity is gone through, thus greatly reduces the service efficiency of Spark platforms.
In view of this, the application provides a kind of processing method of real-time streaming data, by the Spark in Spark platforms SQL components are extended, and increase Stream SQL units so that Spark SQL components can receive externally input class formation Change query language SQL control commands, and the class SQL control commands are parsed.If the class SQL control lives parsed It enables to flow streaming control commands, Spark SQL components then can be according to the class SQL control commands to subsequently received Flow data is handled.If the class SQL control commands parsed are primary Spark SQL control commands, according to described in Class SQL control commands handle the external structurant data specified by the class SQL control commands.
On the one hand, since user is by using succinct class SQL control commands, it is possible to which Spark platforms are received Real time data carries out Stream Processing, reduces user's study using the threshold of Spark platforms, greatly facilitates and simplifies user Use to Spark platforms, so as to be greatly promoted the promotion and application of Spark platforms.
On the other hand, since Spark SQL components can automatically determine the order class of class SQL control commands input by user Type, can be to the external knot specified by the class SQL control commands when such SQL control command is Spark SQL native commands Structure data are handled, when such SQL control command be streaming SQL control commands when, can to real-time streaming data into Row Stream Processing so that the interactive mode that user provides only with Spark SQL components, it is possible to said external structuring Data and real-time streaming data both data are handled.
The third aspect, user can pass through CLI (Command-Line Interface, the life of the offer of Spark SQL components Enable row interface), the API that easily calls and JDBC (connection of Java DataBase Connectivity, java database) this three Kind interactive mode, inputs class SQL control commands, real-time streaming data is handled, so as to extend user to real-time streaming data The interactive mode handled.
Referring to Fig. 1, Fig. 1 is a kind of schematic diagram of Spark platforms shown in one exemplary embodiment of the application;Spark is put down Platform generally includes:Spark SQL components, Spark Streaming components, MLbase/MLlib (Machine Learning, machine Device learns) component and GraphX (graph) component.
Wherein, above-mentioned Spark SQL components are the frame sets that can be used to operate structural data that Spark platforms provide Part.By Spark SQL components, user can using SQL statement come inquire or read external structurant data (such as JASON, Data in Hive, Parquet etc.).In addition, user can also be connected to Spark by third party's intelligence software by JDBC SQL is inquired.
Above-mentioned Spark Streaming components, which are one, carries out real-time streaming data the Stream Processing that height is handled up, height is fault-tolerant System can carry out multiple data sources (such as Kafka, Flume, Twitter) the complicated behaviour such as similar Map, Reduce and Join Make, and result is saved in foreign file system.
Above-mentioned MLbase components are the components being absorbed in Spark platforms for machine learning, include in the component common Machine learning algorithm and using program, including classifying, returning, clustering, collaborative filtering etc..
Above-mentioned GraphX components can be the API for figure and figure parallel computation in Spark platforms.
The processing method of real-time streaming data provided herein is to carry out Function Extension to above-mentioned Spark SQL components, For example increase the function of the resolution unit in Spark SQL components, increase Stream SQL units in Spark SQL components, So as to increase the function of being handled real-time streaming data so that user can be original to platform in realization by class SQL statement External structurant data processing processing while, also the real-time streaming data received is handled.
The Spark SQL components provided below the application are described in detail.
Referring to Fig. 2, Fig. 2 is a kind of schematic diagram of Spark SQL components shown in one exemplary embodiment of the application.
The Spark SQL components provided in the embodiment of the present application may include:Resolution unit, Spark SQL units and Stream SQL units.
Wherein, above-mentioned resolution unit, for after class SQL control commands input by user are received, being controlled to the class SQL System order is parsed.Such SQL control command type is determined by parsing obtained critical field.If such SQL is controlled It orders as primary Spark SQL control commands, then such SQL control command is sent to Spark SQL units.If such SQL Control command is streaming control commands, then such SQL control command is sent to Stream SQL units.
Above-mentioned Spark SQL units are used for operating the core cell of structural data in SparkSQL components.User can The data of external structured data sources (such as JSON, Hive, Parquet) are inquired or read using SQL statement;While it is not It is only supported inside Spark programs and carries out data query using SQL statement, third-party intelligence software is also supported to connect by JDBC Spark SQL are connected to be inquired.Handle above-mentioned Stream SQL units, for from data sources to real-time streaming data It is handled, but is that Spark SQL units are that external structural data is handled with the difference of Spark SQL units, and Stream SQL be to real-time reception to flow data handle.Handling real-time streaming data here, it is possible to understand that For the business demand according to user, flow-type business calculating etc. is carried out to real-time streaming data.Certainly, here only to real-time streaming data It carries out processing to illustrate, it not carried out specifically defined.
More specifically, as shown in figure 3, Fig. 3 is a kind of real-time streaming data shown in one exemplary embodiment of the application The schematic diagram of task processing.
From figure 3, it can be seen that Stream SQL units further comprise receiving module, Stream task processing modules and hair Send module.
Wherein, receiving module is responsible for receiving the data flow in data source, which can be that Kafka is (a kind of high to handle up The distributed post of amount subscribes to message system), Flume (High Availabitity that Cloudera is provided, it is highly reliable, it is distributed Massive logs acquisition, polymerization and transmission system) etc. multiple data sources.The receiving module can be the Spark platforms Receiving module in Spark Streaming components, can also be it is newly developed go out receiving module, do not have here to it Limit to body.
Stream task processing modules, available for crucial according to the data processing in class SQL control commands input by user Word carries out task processing to the real-time streaming data received.
Sending module, available for receiving Stream task processing modules task treated data write-in external storage tool In, which can directly be exchanged with external storage tool, then by real-time streaming data task handle as a result, according to Meet the form of interfaced storage tool, format conversion is carried out to handling result, is then store in the storage tool.
The processing method of the real-time streaming data proposed below to the application is described in detail.
Referring to Fig. 4, Fig. 4 is a kind of flow of real-time streaming data processing method shown in one exemplary embodiment of the application Figure.This method can be used for the Spark SQL components of Spark platforms.This method may include step as follows.
Step 401:Receive externally input type of structured query language SQL control commands;
Step 402:The class SQL control commands are parsed;
Step 403:If the class SQL control commands parsed are streaming control commands, according to the class SQL control commands handle subsequently received flow data;
Step 404:If the class SQL control commands parsed are primary Spark SQL control commands, according to described in Class SQL control commands handle the external structurant data specified by the class SQL control commands.
Wherein, above-mentioned flow data, it can be understood as from data source real-time reception to flow data, so-called flow data can be with It is interpreted as the data that data source is continuously generated, such as the journal file that user is generated in real time using application program, net purchase data etc. Deng.The flow data is usually also referred to as real-time streaming data.
Data processing keyword is carried in above-mentioned class SQL control commands, is divided from type, can be included at two class data Keyword is managed, is closed respectively for the data processing keyword of real-time streaming data and for the data processing of external structurant data Key word.When above-mentioned class SQL control commands are Streaming control commands, the data processing that such SQL control command carries is closed Key word can be the data processing keyword for real-time streaming data, and the data processing indicated by the data processing keyword can wrap Include Join processing, Map processing, Reduce processing and some User Defined processing modes.Certainly can also include in SQL statement Some common calculating tasks, for example sum, screening etc..When above-mentioned class SQL control commands are primary Spark SQL control lives When enabling, the data processing operation indicated by data processing keyword that is carried in such SQL control command can include original The data processing operation that Spark SQL can be performed.
In the embodiment of the present application, the interactive mode identical with Spark SQL components can be used in user, inputs above-mentioned class SQL control commands.
For example, user can be by tri- kinds of modes of API, CLI and JDBC, to Spark SQL components input class SQL control lives It enables.
It should be noted that for the API, provided with the Spark Streaming components in Spark platforms API is entirely different, and the api interface that Spark Streaming are provided needs user to understand the api interface in detail, can just write multiple Miscellaneous program is called into line interface.And in this application, user can input general succinct class SQL control lives by API It enables, real-time streaming data is handled.
In the embodiment of the present application, Spark SQL components can receive class SQL control commands input by user.And to this SQL control commands are parsed.
After being parsed to such SQL control command, some critical fielies are can obtain.
In an optional implementation manner, usual Spark SQL components are preset with related to Streaming control commands Some specific fields, as the first specific field, for example, some fields such as STREAM.Spark SQL components are also preset with With some relevant specific fields of primary Spark SQL control commands, as the second specific field, such as some words such as table Section.
It should be noted that above-mentioned first specific field can be understood as and preset and Streaming control command phases The set for some specific fields closed.Above-mentioned second specific field, which can be understood as preset controlled with primary Spark SQL, orders Enable the set of some relevant specific fields.
Spark SQL components can based on obtained critical field after being parsed to SQL control commands, determine to parse such The type of SQL control commands.If parsing obtained above-mentioned critical field matches above-mentioned first specific field, Spark SQL components Can determine the class SQL control commands parsed is Streaming control commands;If parse obtained above-mentioned critical field matching Above-mentioned second specific field, Spark SQL components can determine that the class SQL control commands parsed are controlled for primary Spark SQL Order.
For example, it is assumed that class SQL control commands input by user are:
“CREAT TABLE tableName(NAME STRING,AGE INT);
Show tables;”
Wherein, CREAT TABLE is create table order, and tableName is table name, and (NAME STRING, AGE INT) is The data type of definition, Show tables are display table order.
Spark SQL components can obtain some critical fielies after being parsed to such control command, such as CREAT TABLE, Show tables etc..By parsing obtained critical field, Spark SQL components find that these critical fielies are specified with second Field (fields such as such as table, show tables) matches, and Spark SQL components can determine the class SQL control commands parsed For primary Spark SQL control commands.
For another example, it is assumed that class SQL control commands input by user are:
Wherein, CREATE STREAM are the order for creating Streaming streams;StreamTable is what is created The title of Streaming streams;(NAME STRING, AGE INT) is the data type of definition;TBLPROPERTIES is represented with key Value is to form storage configuration information;SOURCES is data source to specify socket;HOSTNAME is for the socket IP received Location;SINKS is hive to specify data storage format;Insert is to start the order of Streaming streams created, SELECT For querying command.
Spark SQL components can obtain some critical fielies, such as CREATE after being parsed to such control command STREAM, StreamTable etc..By parsing obtained critical field, Spark SQL components find these critical fielies and the One specific field (some fields of such as STREAM) matches, and Spark SQL components can determine such SQL control command parsed For streaming control commands.
It is, of course, also possible to the command type of above-mentioned class SQL control commands is determined using other methods, here only to true The method for determining the command type of class SQL control commands is illustratively illustrated, it is not carried out specifically defined.
In the embodiment of the present application, usual class SQL control commands carry data processing keyword, are determining user's input Class SQL control commands after, Spark SQL components after being parsed to such SQL control command, can obtain such SQL control The data processing keyword that system order carries.Spark SQL components can be according to these data processing keywords, to real-time streaming data It is handled.
Wherein, above-mentioned data processing keyword can include CREAT (establishment), and insert (startup), join is (at Join Reason), group by (Reduce processing) etc..
For example, it is assumed that class SQL control commands input by user are:
“CREAT TABLE tableName(NAME STRING,AGE INT);
Show tables;”
Wherein, CREAT TABLE are the order for creating table;TableName is table name;(NAME STRING,AGE INT) Data type for definition.
Spark SQL components can parse the control command after such SQL control command is received, after parsing, Spark SQL components can obtain such SQL control command carrying data processing keyword, such as CREAT, Show tables etc., And to the processing operation corresponding to such SQL control command it is establishment table and shows the operation of table being created that.
For another example assume that class SQL control commands input by user are:
Wherein, CREATE STREAM are the order for creating Streaming streams;W is the title of Streaming streams created; (NAME STRING, AGE INT) is the data type of definition;TBLPROPERTIES represents that storage configuration is believed in the form of key-value pair Breath;SOURCES is data source to specify socket;HOSTNAME is the IP address that socket is received;SINKS is specifies number It is hive according to storage form;Insert is to start the order of Streaming streams created;Group represents real-time to what is received Flow data is grouped processing.
Spark SQL components can parse the control command after such SQL control command is received.Pass through solution Analysis, Spark SQL components can obtain the data processing keyword carried in such SQL control command, as CREAT, insert, Group by etc., and can processing keyword be got based on this, determine such corresponding processing operation of SQL control commands to create Streaming flows, and starts streaming streams and complete to handle the classified statistic of real-time streaming data received.
In the embodiment of the present application, when above-mentioned class SQL control commands is flow streaming control commands, Spark SQL groups Part can be handled the real-time streaming data received according to the data processing keyword got.
Still by it is above-mentioned the example of processing is grouped to real-time stream for, to such SQL control command parse after, Spark SQL components can start above-mentioned processing procedure.In this example, Spark SQL components can be according to above-mentioned data processing key Word carries out above-mentioned classified statistic processing operation to the real-time stream received, then will be in handling result and processing procedure The intermediate data (batch generated in such as real-time streaming data processing procedure) of generation is stored in such SQL control command and specifies Storage tool in, in Hive, for user inquire.
In the embodiment of the present application, when the class SQL control commands are primary Spark SQL control commands, then based on obtaining The data processing keyword got handles external structurant data specified by the class SQL control commands, and stores institute State processing as a result, so that user inquires.
In addition, in the embodiment of the present application, the processing of above-mentioned flow data may include Join processing, Map processing, at Reduce Reason and User Defined processing.
For Join tasks, following example can be lifted.
Class SQL control commands input by user are:
Spark SQL components can then obtain data processing keyword in parsing, such as CREAT STREAM w1, CREATE STREAM w2, insert, join etc..Spark SQL components can determine to be performed from the data processing keyword got Processing be to create two Streaming stream, title is respectively w1 and w2.After by w1 and w2 connect entirely, from result set Middle selection age identical record.
For Map tasks, following example can be lifted.
Class SQL control commands input by user are:
Spark SQL components can obtain the number of such SQL control command carrying after being parsed to such SQL control command According to processing keyword, such as CREATE STREAM w, insert, age+100.Spark SQL components can be somebody's turn to do from what is got Data processing keyword determines that the processing to be performed is to create a Streaming streams, entitled w.The corresponding real-time streams to w Data add 100.
For Reduce tasks, following example can be lifted.
Class SQL control commands input by user are:
Spark SQL components can then obtain the data processing keyword of such SQL control command carrying in parsing, such as CREATE STREAM w, insert, group by etc..Spark SQL components can be true from the data processing keyword got The fixed processing to be performed includes creating a Streaming streams, and entitled w, to w, corresponding real-time streaming data is according to age's Value is grouped statistical operation.
In addition, the Spark platforms that the application provides are also to provide custom function operation.
For example, user wants whether realize through Age estimation is young man's demand.
User can write custom function in advance, such as:
Then, user can input class SQL as follows when the custom function is used to carry out real-time streaming data processing Control command is.
add jar youngPeople.jar;
create temporary function isYoung as'com.spark.YoungPeople';
insert into stream console select name,isYoung(age)from w;
Assuming that the real-time streaming data received is the name and age data of user, Spark platforms can obtain after parsing Data processing keyword, such as create temporary function, insert, select name, isYoung (age) etc.. Spark SQL components can determine that the processing to be performed is included according to judgement from the data processing keyword got YoungPeople functions handle the corresponding real-time streams of the w of establishment, i.e., are according to the Age estimation user of user No is young man.
The application provides a kind of processing method of real-time streaming data, by the Spark SQL components in Spark platforms into Row extension, increases Stream SQL units so that Spark SQL components can receive externally input SQL control commands, and right The class SQL control commands are parsed.If the class SQL control commands parsed are streaming control commands, Spark SQL components can then be handled subsequently received flow data according to the class SQL control commands.If it parses The class SQL control commands are primary Spark SQL control commands, then the class SQL are controlled according to the class SQL control commands The specified external structurant data of system order are handled.
On the one hand, since user is by using succinct class SQL control commands, it is possible to which Spark platforms are received Real time data carries out Stream Processing, reduces user's study using the threshold of Spark platforms, greatly facilitates and simplifies user Use to Spark platforms, so as to be greatly promoted the promotion and application of Spark platforms.
On the other hand, since Spark SQL components can automatically determine the order class of class SQL control commands input by user Type, can be to the external knot specified by the class SQL control commands when such SQL control command is Spark SQL native commands Structure data are handled, when such SQL control command be streaming SQL control commands when, can to real-time streaming data into Row Stream Processing so that the interactive mode that user provides only with Spark SQL components, it is possible to said external structuring Data and real-time streaming data both data are handled.
The third aspect, user can by the Spark SQL components CLI provided, the API that easily calls and JDBC these three Interactive mode, input class SQL control commands, real-time streaming data is handled, so as to extend user to real-time streaming data into The interactive mode of row processing.
Referring to Fig. 5, Fig. 5 is a kind of real-time streaming data processing unit block diagram shown in one exemplary embodiment of the application.Institute The Spark SQL components that device is applied to Spark platforms are stated, described device includes:
Resolution unit 501, for receiving externally input type of structured query language SQL control commands, and to the class SQL control commands are parsed;
Stream SQL units 502, if the class SQL control commands for parsing are stream streaming control lives It enables, then subsequently received flow data is handled according to the class SQL control commands.
Spark SQL units 503, if the class SQL control commands for parsing are primary Spark SQL control lives It enables, then the external structurant data specified by the class SQL control commands is handled according to the class SQL control commands.
Optionally, the resolution unit 501 is additionally operable to based on the key obtained after being parsed to the class SQL control commands Field determines the type of the class SQL control commands parsed.
Optionally, the resolution unit 501, if matching the first default specific field specifically for the critical field, really Surely the class SQL control commands parsed are the streaming control commands;If the critical field matching second is default Specific field, it is determined that the class SQL control commands parsed are the primary Spark SQL control commands.
Optionally, the class SQL control commands carry data processing keyword;
The Stream SQL units 502, specifically for the data processing key carried according to the class SQL control commands Word handles subsequently received flow data;
The data processing keyword includes:Join processing, Map processing, Reduce processing and User Defined processing.
Optionally, in the following way one or more of, the Spark SQL components in the following way in One or more, receive externally input class SQL control commands, and stream data is handled:
Application programming interface API modes;
Command Line Interface CLI modes;
Java database connection mode JDBC modes.
Accordingly, present invention also provides the hardware configurations of Fig. 5 shown devices.Referring to Fig. 6, Fig. 6 is what the application provided The hardware architecture diagram of Fig. 5 shown devices, the system include:Processor 601, memory 602 and bus 603;Wherein, it handles Device 601 and memory 602 complete mutual communication by bus 603.
Wherein, processor 601 can be a CPU, and memory 602 can be nonvolatile memory (non- Volatile memory), and metadata management logic instruction is stored in memory 602, processor 601 can be performed and be deposited The metadata management logic instruction stored in reservoir 602, it is shown in Figure 3 to realize the management function of above-mentioned real-time streaming data Flow.
As shown in fig. 6, the hardware configuration can also include the power management that a power supply module is configured as executive device 604 and input and output (I/O) interface 605.
For device embodiment, since it corresponds essentially to embodiment of the method, so related part is referring to method reality Apply the part explanation of example.The apparatus embodiments described above are merely exemplary, wherein described be used as separating component The unit of explanation may or may not be physically separate, and the component shown as unit can be or can also It is not physical unit, you can be located at a place or can also be distributed in multiple network element.It can be according to reality It needs that some or all of module therein is selected to realize the purpose of application scheme.Those of ordinary skill in the art are not paying In the case of going out creative work, you can to understand and implement.
The foregoing is merely the preferred embodiment of the application, not limiting the application, all essences in the application God and any modification, equivalent substitution, improvement and etc. within principle, done, should be included within the scope of the application protection.

Claims (10)

1. a kind of stream data processing method, which is characterized in that the method is applied to the Spark SQL components of Spark platforms, institute The method of stating includes:
Receive externally input type of structured query language SQL control commands;
The class SQL control commands are parsed;
If the class SQL control commands parsed are stream streaming control commands, according to the class SQL control commands Subsequently received flow data is handled;
If the class SQL control commands parsed are primary Spark SQL control commands, control and order according to the class SQL Order handles the external structurant data specified by the class SQL control commands.
2. according to the method described in claim 1, it is characterized in that, the class SQL control commands are carried out parsing it described Afterwards, the method further includes:
Based on the critical field obtained after being parsed to the class SQL control commands, the class SQL control commands parsed are determined Type.
3. according to the method described in claim 2, it is characterized in that, described be based on obtaining after parsing the class SQL control commands The critical field arrived determines the type of the class SQL control commands parsed, including:
If the critical field obtained after parsing the class SQL control commands matches the first default specific field, determine to parse The class SQL control commands be the streaming control commands;
If the critical field obtained after being parsed to the class SQL control commands matches the second default specific field, it is determined that parsing The class SQL control commands gone out are the primary Spark SQL control commands.
4. according to the method described in claim 1, it is characterized in that, the class SQL control commands carry data processing key Word;
It is described that subsequently received flow data is handled according to the class SQL control commands, including:
According to the data processing keyword that the class SQL control commands carry, subsequently received flow data is handled;
The processing of the flow data includes:Join processing, Map processing, Reduce processing and User Defined processing.
5. it according to the method described in claim 1, it is characterized in that, one or more of in the following way, receives outer The class SQL control commands of portion's input, stream data are handled:
Application programming interface API modes;
Command Line Interface CLI modes;
Java database connection mode JDBC modes.
6. a kind of flow data processing device, which is characterized in that described device is applied to the Spark SQL components of Spark platforms, institute Device is stated to include:
Resolution unit for receiving externally input type of structured query language SQL control commands, and controls the class SQL Order is parsed;
Stream SQL units, if the class SQL control commands for parsing are stream streaming control commands, according to Subsequently received flow data is handled according to the class SQL control commands;
Spark SQL units, if the class SQL control commands for parsing are primary Spark SQL control commands, according to The external structurant data specified by the class SQL control commands are handled according to the class SQL control commands.
7. device according to claim 6, which is characterized in that the resolution unit is additionally operable to be based on controlling the class SQL The critical field obtained after command analysis processed determines the type of the class SQL control commands parsed.
8. device according to claim 7, which is characterized in that the resolution unit, if specifically for being controlled to the class SQL The critical field obtained after command analysis processed matches the first default specific field, determines the class SQL control commands parsed For the streaming control commands;If the critical field matching second obtained after being parsed to the class SQL control commands is default Specific field, it is determined that the class SQL control commands parsed are the primary Spark SQL control commands.
9. device according to claim 6, which is characterized in that the class SQL control commands carry data processing key Word;
The Stream SQL units, specifically for the data processing keyword carried according to the class SQL control commands, to rear The flow data that continued access receives is handled;
The data processing keyword includes:Join processing, Map processing, Reduce processing and User Defined processing.
10. device according to claim 6, which is characterized in that the Spark SQL components in the following way in one Kind is a variety of, receives externally input class SQL control commands, stream data is handled:
Application programming interface API modes;
Command Line Interface CLI modes;
Java database connection mode JDBC modes.
CN201710773501.7A 2017-08-31 2017-08-31 A kind of real-time streaming data processing method and processing device Pending CN108255913A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710773501.7A CN108255913A (en) 2017-08-31 2017-08-31 A kind of real-time streaming data processing method and processing device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710773501.7A CN108255913A (en) 2017-08-31 2017-08-31 A kind of real-time streaming data processing method and processing device

Publications (1)

Publication Number Publication Date
CN108255913A true CN108255913A (en) 2018-07-06

Family

ID=62721978

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710773501.7A Pending CN108255913A (en) 2017-08-31 2017-08-31 A kind of real-time streaming data processing method and processing device

Country Status (1)

Country Link
CN (1) CN108255913A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111262915A (en) * 2020-01-10 2020-06-09 北京东方金信科技有限公司 Kafka cluster-crossing data conversion system and method
CN111651148A (en) * 2020-06-09 2020-09-11 北京思特奇信息技术股份有限公司 Dynamic generation method and system of Stream SQL
CN113064910A (en) * 2021-03-18 2021-07-02 西南科技大学 Reaction type pneumatic data multidimensional analysis platform

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103164270A (en) * 2011-12-12 2013-06-19 阿里巴巴集团控股有限公司 Java system application programming interface calling method and system using the same
CN104885077A (en) * 2012-09-28 2015-09-02 甲骨文国际公司 Managing continuous queries with archived relations
CN105868019A (en) * 2016-02-01 2016-08-17 中国科学院大学 Automatic optimization method for performance of Spark platform
CN106844546A (en) * 2016-12-30 2017-06-13 江苏号百信息服务有限公司 Multi-data source positional information fusion method and system based on Spark clusters

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103164270A (en) * 2011-12-12 2013-06-19 阿里巴巴集团控股有限公司 Java system application programming interface calling method and system using the same
CN104885077A (en) * 2012-09-28 2015-09-02 甲骨文国际公司 Managing continuous queries with archived relations
CN105868019A (en) * 2016-02-01 2016-08-17 中国科学院大学 Automatic optimization method for performance of Spark platform
CN106844546A (en) * 2016-12-30 2017-06-13 江苏号百信息服务有限公司 Multi-data source positional information fusion method and system based on Spark clusters

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111262915A (en) * 2020-01-10 2020-06-09 北京东方金信科技有限公司 Kafka cluster-crossing data conversion system and method
CN111262915B (en) * 2020-01-10 2020-09-22 北京东方金信科技有限公司 Kafka cluster-crossing data conversion system and method
CN111651148A (en) * 2020-06-09 2020-09-11 北京思特奇信息技术股份有限公司 Dynamic generation method and system of Stream SQL
CN113064910A (en) * 2021-03-18 2021-07-02 西南科技大学 Reaction type pneumatic data multidimensional analysis platform
CN113064910B (en) * 2021-03-18 2022-03-08 西南科技大学 Reaction type pneumatic data multidimensional analysis platform

Similar Documents

Publication Publication Date Title
US11568305B2 (en) System and method for customer journey event representation learning and outcome prediction using neural sequence models
CN102298607B (en) The schema contract of data integration
US9092491B2 (en) Searching documentation across interconnected nodes in a distributed network
US20100037157A1 (en) Proactive machine-aided mashup construction with implicit and explicit input from user community
WO2017166944A1 (en) Method and device for providing service access
CN107943945A (en) Isomery operator management method in a kind of big data analysis development platform
JP2010531481A (en) Transfer of tabular parameters via tabular data stream protocol
US20140101635A1 (en) Automated generation of two-tier mobile applications
US10311364B2 (en) Predictive intelligence for service and support
US20230108560A1 (en) Methods and Systems for Representation, Composition and Execution of Artificial Intelligence Centric Applications
CN1956454B (en) Method and system for bundling and sending work units to a server based on a weighted cost
US11567735B1 (en) Systems and methods for integration of multiple programming languages within a pipelined search query
CN108255913A (en) A kind of real-time streaming data processing method and processing device
WO2024016547A1 (en) Data query method and device based on multi-party collaboration
US20220350812A1 (en) Data distribution process configuration method and apparatus, electronic device and storage medium
US20240004853A1 (en) Virtual data source manager of data virtualization-based architecture
CN110781180A (en) Data screening method and data screening device
CN109408658A (en) Expression picture reminding method, device, computer equipment and storage medium
US9229980B2 (en) Composition model for cloud-hosted serving applications
CN112015382B (en) Processor architecture analysis method, device, equipment and storage medium
US11720942B1 (en) Interactive retrieval using visual semantic matching
US11960616B2 (en) Virtual data sources of data virtualization-based architecture
US20210373916A1 (en) Software plugins of data virtualization-based architecture
US20100094893A1 (en) Query interface configured to invoke an analysis routine on a parallel computing system as part of database query processing
CN109951376B (en) Instant messaging software information acquisition method, device, system and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180706

RJ01 Rejection of invention patent application after publication