WO2019242470A1 - 数据处理方法、装置、设备及计算机可读存储介质 - Google Patents

数据处理方法、装置、设备及计算机可读存储介质 Download PDF

Info

Publication number
WO2019242470A1
WO2019242470A1 PCT/CN2019/088974 CN2019088974W WO2019242470A1 WO 2019242470 A1 WO2019242470 A1 WO 2019242470A1 CN 2019088974 W CN2019088974 W CN 2019088974W WO 2019242470 A1 WO2019242470 A1 WO 2019242470A1
Authority
WO
WIPO (PCT)
Prior art keywords
data processing
union operation
stream
union
task
Prior art date
Application number
PCT/CN2019/088974
Other languages
English (en)
French (fr)
Inventor
陈双
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2019242470A1 publication Critical patent/WO2019242470A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions

Definitions

  • the present application relates to the field of data processing technologies, and in particular, to a data processing method, apparatus, device, and computer-readable storage medium.
  • the batch data model uses ETL (Extract (Extract), Transform (Interactive Transformation), and Load (Load)) to build a data system, and access the data through a query language (for example: SQL (Structured Query Language, Structured Query Language)) System for data analysis.
  • ETL Extract
  • Transform Interactive Transformation
  • Load Load
  • SQL Structured Query Language
  • Streaming ETL data systems can calculate and process a continuous stream of data in real time, and store and provide the processing results to business systems (such as vehicle monitoring , Personnel deployment, and real-time people flow warning and other business systems), however, when the existing streaming ETL data system merges multiple data streams, it is necessary to perform an operation of inserting a data source, calculation and storage process for each data stream. slow.
  • business systems such as vehicle monitoring , Personnel deployment, and real-time people flow warning and other business systems
  • This application proposes a data processing method, apparatus, device, and computer-readable storage medium.
  • the technical solution adopted in the embodiments of the present application is to provide a data processing method, including:
  • the first data processing instruction that includes the union operation identifier in the stream data processing task is parsed to obtain identifiable second data. Processing instruction
  • An embodiment of the present application further provides a data processing apparatus, including:
  • a parsing module configured to parse a first data processing instruction that includes the union operation identifier in the stream data processing task when it is identified that the union operation identifier is included in the stream data processing task, and obtain The identified second data processing instruction;
  • the operation module is configured to perform a union operation on a plurality of stream data processing results by executing the second data processing instruction.
  • An embodiment of the present application further provides a data processing device, where the data processing device includes a processor and a memory; the processor is configured to execute a data processing program stored in the memory to implement the steps of the foregoing data processing method.
  • An embodiment of the present application further provides a computer-readable storage medium, where the computer-readable storage medium stores one or more programs, and the one or more programs can be executed by one or more processors to implement the foregoing. Steps of the data processing method.
  • a first data processing instruction corresponding to a stream data processing task containing a union operation identifier is parsed to obtain a second data processing instruction, and the second data processing instruction is executed by executing The union operation is performed on multiple stream data processing results, and the streaming ETL data system implements the union operation on multiple stream data processing results.
  • FIG. 1 is a first flowchart of a data processing method according to an embodiment of the present application
  • FIG. 2 is a second flowchart of a data processing method according to an embodiment of the present application.
  • FIG. 3 is a third flowchart of a data processing method according to an embodiment of the present application.
  • FIG. 4 is a schematic diagram of a streaming ETL model according to an embodiment of the present application.
  • FIG. 5 is a schematic diagram of an intersection monitoring real-time service system according to an embodiment of the present application.
  • FIG. 6 is a fourth flowchart of a data processing method according to an embodiment of the present application.
  • FIG. 7 is a first schematic structural diagram of a structure of a data processing device according to an embodiment of the present application.
  • FIG. 8 is a second schematic diagram of a structure of a data processing device according to an embodiment of the present application.
  • FIG. 9 is a third schematic structural diagram of a structure of a data processing device according to an embodiment of the present application.
  • FIG. 10 is a fourth schematic diagram of the composition and structure of a data processing device according to an embodiment of the present application.
  • StreamCQL Stream Continuous SQL, Stream Processing Platform SQL Engine
  • the first-class data processing results are inserted into the output data stream, respectively, with low execution efficiency and slow processing speed.
  • the data processing method provided in the embodiment of the present application is applied to StreamCQL, and is used to implement the union operation of multiple stream data processing results of the StreamCQL described above, so as to solve the problem that the streamcql cannot perform the union operation in the prior art, resulting in low execution efficiency and slow processing speed.
  • the problem is applied to StreamCQL, and is used to implement the union operation of multiple stream data processing results of the StreamCQL described above, so as to solve the problem that the streamcql cannot perform the union operation in the prior art, resulting in low execution efficiency and slow processing speed. The problem.
  • An embodiment of the present application provides a data processing method. As shown in FIG. 1, the method includes the following steps:
  • step S101 when it is recognized that the union operation identifier is included in the streaming data processing task, the first data processing instruction including the union operation identifier in the streaming data processing task is analyzed to obtain an identifiable second data processing instruction. .
  • the operation identifier includes a union operator.
  • the manner of identifying whether a union operation identifier is included in the streaming data processing task is not specifically limited, and may be based on one or more preset union operation identifiers to determine whether the stream data processing task An identifier matching the preset union operation identifier exists. If an identifier matching the preset union operation identifier exists in the streaming data processing task, it is determined that the union operation identifier is included in the streaming data processing task, otherwise the stream is determined. The union operation identifier is not included in the distributed data processing task.
  • parsing the first data processing instruction that includes the union operation identifier in the streaming data processing task, and obtaining the identifiable second data processing instruction includes, but is not limited to, including the union operation in the streaming data processing task.
  • the identified first data processing instruction is parsed to obtain an identifiable second data processing instruction for performing a union operation.
  • the unexecutable union operation data processing instruction is converted into a identifiable union operation data processing instruction, which prevents StreamCQL from receiving union operation operations.
  • step S101 includes: when it is identified that the streaming data processing task includes the union operator, parsing the first SQL data processing instruction including the unionall operator in the streaming data processing task to obtain an identifiable Second SQL data processing instruction for performing union operation.
  • step S102 a second data processing instruction is executed to perform a union operation on a plurality of stream data processing results.
  • StreamCQL supports union operation, which effectively improves the stream data processing execution efficiency and processing speed.
  • the data processing method described in the first embodiment of the present application implements a union operation of a plurality of stream data processing results by a streaming ETL data system.
  • An embodiment of the present application further provides a data processing method, as shown in FIG. 2, including the following specific steps:
  • Step S201 When a streaming data processing task is received, based on a preset union operation identification character matching template, perform a union operation identification character matching identification on all data processing instructions in the streaming data processing task to identify the stream. Whether the union operation identifier is included in the embedded data processing task.
  • the preset union operation identifier character matching template is not specifically limited, and may be a union operation identifier character matching template set by an engineer according to engineering experience.
  • the operation identifier includes a union operator.
  • Step S202 Perform lexical analysis and syntax analysis on the regular expression of the first data processing instruction to obtain a recognizable data processing instruction file.
  • step S202 includes: performing lexical analysis and syntax analysis on the regular expression of the first SQL data processing instruction to obtain a recognizable SQL data processing instruction file.
  • Step S203 Parse the data processing instruction file to obtain a syntax tree object for the union operation.
  • the unexecutable union operation data processing instruction is converted into a identifiable union operation syntax tree object, preventing StreamCQL from receiving the union operation.
  • step S203 includes: parsing the SQL data processing instruction file to obtain a syntax tree object for the union operation.
  • Step S204 Perform a union operation on a plurality of stream data processing results according to a syntax tree object of the union operation.
  • the union operation is performed on multiple stream data processing results, and StreamCQL supports the union operation, which effectively improves the execution efficiency and processing speed of stream data processing.
  • An embodiment of the present application further provides a data processing method, as shown in FIG. 3, including the following specific steps:
  • Step S301 When a streaming data processing task is received, based on a preset union operation identification character matching template, perform a union operation identification character matching identification on all data processing instructions in the streaming data processing task to identify the stream. Whether the union operation identifier is included in the embedded data processing task.
  • the preset union operation identifier character matching template is not specifically limited, and may be a union operation identifier character matching template set by an engineer according to engineering experience.
  • the operation identifier includes a union operator.
  • step S302 when it is identified that the union operation identifier is included in the streaming data processing task, the first data processing instruction including the union operation identifier in the streaming data processing task is analyzed to obtain an identifiable second data processing instruction. .
  • the identifiable second data processing instruction includes a syntax tree object of a union operation; parsing the first data processing instruction containing the union operation identifier in the streaming data processing task to obtain an identifiable second data processing instruction Ways, including:
  • the unidentifiable union operation data processing instruction is converted into an identifiable second data processing instruction, which prevents StreamCQL from receiving the union operation data.
  • step S303 the received streaming data processing task is decomposed into multiple sub-streaming data processing tasks; all independent sub-streaming data processing tasks are executed in parallel, and dependent sub-streaming data processing tasks are executed sequentially to obtain Multiple stream data processing results.
  • step S304 a second data processing instruction is executed to perform a union operation on a plurality of stream data processing results.
  • step S304 includes: when the number of the first data processing instruction is multiple, the second data processing instruction corresponding to each first data processing instruction is executed in parallel by multiple threads to Perform the union operation on the data processing results to obtain the stream data union processing result; insert the stream data union processing result into the output data stream.
  • Multiple threads execute each second data processing instruction in parallel, and perform a union operation on the stream data processing results in parallel by multiple threads, which implements StreamCQL's support for union operations, and greatly improves the execution efficiency and processing of stream data processing. speed.
  • step S304 the data processing method described in this embodiment further includes:
  • Step S305 Push the output data stream to the terminal data storage, so that the terminal data storage can provide real-time system output or real-time data report output.
  • the purpose of quickly obtaining the output data stream is achieved; and the method of task decomposition is used to achieve the parallel execution of subtasks and union operations; the purpose of fast execution of stream ETL, fast and efficient merging of multiple streams, and reduction of stream processing latency, Improve the real-time capability of real-time stream processing.
  • the data processing method described in the embodiment of the present application implements a union operation of a plurality of stream data processing results by the streaming ETL data system.
  • the real-time monitoring cameras installed at the intersections and bayonets collect the raw video stream data of intersection traffic and people in real time, input the raw video stream data into the unstructured receiving subsystem for data temporary storage, and then enter the stream ETL model as real-time input Stream, real-time input stream as the input data source of the stream ETL model is used to perform data governance based on streaming tasks in stream computing.
  • the collected data to the original video stream comprises: S 1, S 2, ... , S n (n is the number of real-time monitoring cameras).
  • the real-time input stream 4 comprising: original image data S 1, S 2, ..., S n and data corresponding to the original video stream acquisition time t 1, t 2, ..., t n; ETL job stream in the form of StreamCQL Submit flow ETL tasks; the purpose of monitoring and early warning systems for traffic and pedestrian flow at intersections and checkpoints is to perform real-time statistics and early warning on the number of people and vehicles through data management of raw video stream data.
  • Convection S 1 , S 2 , ... , S n and streams t 1 , t 2 , ..., t n include union operations.
  • StreamCQL includes the data management of stream stream by the real-time data processing subsystem of the stream ETL in Figure 5 and breaks StreamCQL does not support union all operations. Barriers; optimize tasks during stream data governance, increase stream processing speed, and reduce stream processing time.
  • the data processing method described in this embodiment, as shown in FIG. 6, includes the following specific steps:
  • Step S401 the grammar recognition module identifies whether the submitted streaming data processing task contains a union operator, and when the SQL statement in the submitted streaming data processing task includes a union operator, the character matching template The lexical recognition is performed, and the text sentence containing the union in the streaming data processing task is segmented according to the lexical matching result to obtain and recognize the SQL sentence containing the union, and the SQL statement regular expression is transmitted to the parsing module.
  • the parsing module includes a parser and a parser, and performs lexical analysis and parsing on the input SQL regular expression through the parser to generate a SQL file that can be recognized by the parser in the stream ETL;
  • the parser parses the SQL file containing union syntax and generates a syntax tree object for union syntax.
  • Step S403 the data S 1, S 2, ..., and S n is set for operation by the operator operating the input stream module.
  • the unstructured data since the input stream data is unstructured data, the unstructured data needs to be converted into a required data form.
  • the unstructured data stream needs to be converted into structured data, and the structured data must be specified.
  • field extracted as sub-data stream m 1, m 2, ..., m n, drawn into the sub-data stream m 1, m 2, ..., the m n, sub-data stream m 1, m 2, ..., m n and time streams t 1 , t 2 , ..., t n perform fast union operations directly to implement StreamCQL's support for union all.
  • Step S404 perform sentence segmentation on the submitted streaming data processing task by executing the optimization module, and decompose it into multiple streaming data processing task sub-tasks.
  • Each streaming data processing task sub-task is executed in parallel to obtain streaming task processing.
  • Intermediate results, by executing the optimization module in a multi-threaded manner, each sub-task of the streaming data processing task containing union all is executed in parallel, and the multiple input intermediate streams m 1 , m 2 , ..., m n are inserted into the output data stream in parallel
  • the results are aggregated into an output stream to produce real-time result data.
  • the real-time monitoring camera shown in FIG. 5 can be used to collect and obtain raw video data and a collection time stream corresponding to the raw video data.
  • the unstructured data receiving subsystem can be used to receive the raw video data obtained by the real-time monitoring camera and the acquisition time stream corresponding to the raw video data.
  • the information service recording subsystem shown in FIG. 5 records the number of vehicles and people passing through a certain intersection or bayonet from time T i to time T j according to the output data stream;
  • the business service subsystem shown in Figure 5 displays the real-time traffic flow and the number of people passing through an intersection or bayonet at times T i to T j , real-time broadcasts of road traffic and pedestrian flow, and early warning of traffic and pedestrian flow, and road diversion. .
  • the data processing device provided in the embodiment of the present application is provided in StreamCQL, and is used to implement the above-mentioned StreamCQL union operation of multiple stream data processing results, so as to solve the problem that the streamCQL cannot perform the union operation in the prior art, resulting in low execution efficiency and slow processing speed.
  • the problem is provided in StreamCQL, and is used to implement the above-mentioned StreamCQL union operation of multiple stream data processing results, so as to solve the problem that the streamCQL cannot perform the union operation in the prior art, resulting in low execution efficiency and slow processing speed. The problem.
  • An embodiment of the present application further provides a data processing apparatus, as shown in FIG. 7, including the following components:
  • the analysis module 11 is configured to analyze the first data processing instruction including the union operation identifier in the streaming data processing task when it is identified that the union operation identifier is included in the streaming data processing task to obtain an identifiable second data processing instruction. Data processing instructions.
  • the operation identifier includes a union operator.
  • the parsing module 11 does not specifically limit the manner of identifying whether the union operation identifier is included in the streaming data processing task, and may be based on one or more preset union operation identifiers to determine the stream type. Whether a ID matching the preset union operation ID exists in the data processing task. If a ID matching the preset union operation ID exists in the streaming data processing task, it is determined that the streaming data processing task includes a union operation. ID, otherwise the union operation ID is not included in the streaming data processing task.
  • the parsing module 11 parses a first data processing instruction that includes a union operation identifier in a streaming data processing task, and obtains an identifiable second data processing instruction in a manner including, but not limited to, a streaming data processing task
  • the first data processing instruction containing the union operation identifier is parsed to obtain an identifiable second data processing instruction for performing the union operation.
  • the parsing module 11 When the parsing module 11 recognizes that a union operation identifier is included in the streaming data processing task, it converts an unexecutable union operation data processing instruction into an identifiable union operation data processing instruction, thereby avoiding StreamCQL reception The disadvantage of a task error during the data processing instruction to the union operation and refusal to perform the union operation.
  • the parsing module 11 is configured to: when it is identified that the streaming data processing task includes a union operator, process the first SQL data processing that includes the unionall operator in the streaming data processing task The instructions are parsed to obtain a recognizable second SQL data processing instruction configured to perform a union operation.
  • the operation module 12 is configured to perform a union operation on a plurality of stream data processing results by executing a second data processing instruction.
  • the operation module 12 executes a second data processing instruction to perform a union operation on a plurality of stream data processing results, thereby implementing StreamCQL's support for the union operation, and effectively improving stream data processing execution efficiency and processing speed.
  • An embodiment of the present application further provides a data processing apparatus, as shown in FIG. 8, including the following components:
  • the identification module 21 is configured to, before parsing the first data processing instruction that includes the union operation identifier in the streaming data processing task, based on a preset when a streaming data processing task is received,
  • the union operation identifier character matching template performs union operation identifier character matching identification on all data processing instructions in the streaming data processing task to identify whether the union operation identifier is included in the streaming data processing task.
  • the preset union operation identifier character matching template is not specifically limited, and may be a union operation identifier character matching template set by an engineer according to engineering experience.
  • the operation identifier includes a union operator.
  • the identification module 21 performs union operation identifier character matching identification on all data processing instructions in the streaming data processing task through a preset union operation identifier character matching template, thereby achieving effective identification of the union operation identifier and avoiding
  • StreamCQL cannot identify the disadvantages of union operation identification.
  • the identifiable second data processing instruction includes a syntax tree object of a union operation;
  • the parsing module 22 is configured to perform lexical analysis and grammatical analysis on a regular expression of the first data processing instruction, and obtain The identified data processing instruction file; by analyzing the data processing instruction file, a syntax tree object of the union operation is obtained.
  • the unexecutable union operation data processing instruction is converted into a identifiable union operation syntax tree object, preventing StreamCQL from receiving the union operation.
  • the operation module 23 is configured to perform a union operation on a plurality of stream data processing results according to a syntax tree object of the union operation.
  • the operation module 23 performs a union operation on a plurality of stream data processing results according to a syntax tree object of the union operation, and implements StreamCQL's support for union operations, which effectively improves the execution efficiency of stream data processing. And processing speed.
  • An embodiment of the present application further provides a data processing apparatus, as shown in FIG. 9, including the following components:
  • the recognition module 31 is configured to perform a union operation identifier character matching recognition on all data processing instructions in the streaming data processing task based on a preset union operation identifier character matching template when a streaming data processing task is received, To identify whether the union operation ID is included in the streaming data processing task.
  • the preset union operation identifier character matching template is not specifically limited, and may be a union operation identifier character matching template set by an engineer according to engineering experience.
  • the operation identifier includes a union operator.
  • the recognition module 31 performs union operation identifier character matching and identification on all data processing instructions in the streaming data processing task through a preset union operation identifier character matching template, and implements the union operation identifier. Effective identification avoids the disadvantages that StreamCQL cannot identify the union operation identifier in the prior art.
  • the analysis module 32 is configured to parse the first data processing instruction that includes the union operation identifier in the streaming data processing task when it is identified that the union operation identifier is included in the streaming data processing task to obtain an identifiable second Data processing instructions.
  • the identifiable second data processing instruction includes a syntax tree object of a union operation; parsing the first data processing instruction containing the union operation identifier in the streaming data processing task to obtain an identifiable second data processing instruction
  • the method includes: performing a lexical analysis and a syntax analysis on the regular expression of the first data processing instruction to obtain an identifiable data processing instruction file; and obtaining a syntax tree object of the union operation by analyzing the data processing instruction file.
  • the parsing module 32 is configured to perform a lexical analysis and a grammatical analysis on the regular expression of the first data processing instruction when it is identified that the union operation identifier is included in the streaming data processing task, to obtain a recognizable Data processing instruction file; Parse the data processing instruction file to get the syntax tree object of the union operation.
  • the unidentifiable union operation data processing instruction is converted into an identifiable second data processing instruction, which prevents StreamCQL from receiving the union operation data.
  • a parallel processing module 33 configured to decompose a received streaming data processing task into a plurality of sub-streaming data processing tasks; execute all independent sub-streaming data processing tasks in parallel, and sequentially execute dependent sub-streaming data processing Task to get multiple stream data processing results.
  • the operation module 34 is configured to perform a union operation on a plurality of stream data processing results by executing a second data processing instruction.
  • the operation module 34 is configured to: when the number of the first data processing instructions is multiple, execute the second data processing instruction corresponding to each first data processing instruction in parallel through multiple threads to Perform union operation on each stream data processing result to obtain the stream data union processing result; insert the stream data union processing result into the output data stream.
  • Multiple threads execute each second data processing instruction in parallel, and perform a union operation on the stream data processing results in parallel by multiple threads, which implements StreamCQL's support for union operations, and greatly improves the execution efficiency and processing of stream data processing. speed.
  • the data processing apparatus described in this embodiment further includes:
  • the real-time storage output module 35 is configured to push and store the output data stream to the terminal data storage, so that the terminal data storage provides real-time system output or real-time data report output.
  • the purpose of quickly obtaining the output data stream is achieved; and the method of task decomposition is used to achieve the parallel execution of subtasks and union operations; the purpose of fast execution of stream ETL, fast and efficient merging of multiple streams, and reduction of stream processing latency, Improve the real-time capability of real-time stream processing.
  • the real-time storage output module 35 can be implemented by a central processing unit (CPU, Central Processing Unit), digital signal processor (DSP, Digital Signal Processor), microcontroller (MCU, Microcontroller unit), or programmable gate array ( FPGA, Field-Programmable Gate Array); the real-time storage output module 35 in the device can be used in practical applications through communication modules (including: basic communication suite, operating system, communication module, standardized interfaces and protocols, etc.) and Transceiver antenna is implemented.
  • CPU Central Processing Unit
  • DSP Digital Signal Processor
  • MCU Microcontroller unit
  • FPGA Field-Programmable Gate Array
  • An embodiment of the present application further provides a data processing device, as shown in FIG. 10, including the following components: a processor 501 and a memory 502.
  • the processor 501 and the memory 502 may be connected through a bus or other manners.
  • the processor 501 may be a general-purpose processor, such as a Central Processing Unit (CPU), a digital signal processor (DSP), an application specific integrated circuit (English: Application Specific Integrated Circuit) , ASIC), or one or more integrated circuits configured to implement the embodiments of the present application.
  • the memory 502 is configured to store executable instructions of the processor 501.
  • the memory 502 is configured to store a program code and transmit the program code to the processor 501.
  • the memory 502 may include volatile memory (Volatile Memory), such as Random Access Memory (RAM); the memory 502 may also include non-volatile memory (Non-Volatile Memory), such as Read-only memory (Read- Only memory (ROM), flash memory (Flash memory), hard disk (Hard Disk Drive) (HDD) or solid state drive (Solid-State Drive (SSD)); the memory 502 may also include a combination of the above types of memory.
  • the processor 501 is configured to call program code management code stored in the memory 502 and execute part or all of the steps in the embodiments of the present application.
  • An embodiment of the present application further provides a computer-readable storage medium.
  • the computer storage medium may be a RAM memory, a flash memory, a ROM memory, an EPROM memory, an EEPROM memory, a register, a hard disk, a mobile hard disk, a CD-ROM, or any other form of storage medium known in the art.
  • the computer-readable storage medium stores one or more programs, and the one or more programs can be executed by one or more processors to implement some or all steps in the embodiments of the present application.
  • the computer-readable storage medium stores one or more programs, and the one or more programs can be executed by one or more processors, which can implement a streaming ETL data system for multiple streams.
  • the data processing results are unioned.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请实施例公开了一种数据处理方法,包括:在识别到流式数据处理任务中包含并集操作标识的情况下,对所述流式数据处理任务中包含所述并集操作标识的第一数据处理指令进行解析,得到可识别的第二数据处理指令;通过执行所述第二数据处理指令,以对多个流数据处理结果进行并集操作。本申请实施例还公开了一种数据处理装置、设备及计算机可读存储介质。

Description

数据处理方法、装置、设备及计算机可读存储介质
相关申请的交叉引用
本申请基于申请号为201810645397.8、申请日为2018年06月21日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此以引入方式并入本申请。
技术领域
本申请涉及数据处理技术领域,尤其涉及一种数据处理方法、装置、设备及计算机可读存储介质。
背景技术
相关技术中,批量数据模型使用ETL(Extract(抽取)、Transform(交互转换)和Load(加载))构建数据系统,通过查询语言(例如:SQL(Structured Query Language,结构化查询语言))访问数据系统,以进行数据分析。随着信息数据的实时化和流式化的发展,传统的批量处理数据方式难以应付流式数据处理,且不能很好的满足数据实时计算的需求。为能够实时处理流式数据,流式计算应运而生,流式ETL数据系统能够对源源不断的流式数据进行实时计算和处理,并将处理结果进行存储并提供给业务系统(例如:车辆监控、人员布控和实时人流预警等业务系统)使用,然而现有的流式ETL数据系统在对多个数据流进行合并时,需要对每个数据流做一次插入数据源的操作,计算及存储过程缓慢。
发明内容
本申请提出一种数据处理方法、装置、设备及计算机可读存储介质。
本申请实施例采用的技术方案是提供一种数据处理方法,包括:
在识别到流式数据处理任务中包含并集操作标识的情况下,对所述流式数据处理任务中包含所述并集操作标识的第一数据处理指令进行解析,得到可识别的第二数据处理指令;
通过执行所述第二数据处理指令,以对多个流数据处理结果进行并集操作。
本申请实施例还提供一种数据处理装置,包括:
解析模块,配置为在识别到流式数据处理任务中包含并集操作标识的情况下,对所述流式数据处理任务中包含所述并集操作标识的第一数据处理指令进行解析,得到可识别的第二数据处理指令;
操作模块,配置为通过执行所述第二数据处理指令,以对多个流数据处理结果进行并集操作。
本申请实施例还提供一种数据处理设备,所述数据处理设备包括处理器和存储器;所述处理器用于执行存储器中存储的数据处理程序,以实现上述的数据处理方法的步骤。
本申请实施例还提供一种计算机可读存储介质,所述计算机可读存储介质存储有一个或者多个程序,所述一个或者多个程序可被一个或者多个处理器执行,以实现上述的数据处理方法的步骤。
采用本申请实施例的技术方案,通过对包含有并集操作标识的流数据处理任务对应的第一数据处理指令进行解析,以获得第二数据处理指令,通过对第二数据处理指令进行执行以对多个流数据处理结果进行并集操作,实现了流式ETL数据系统对多个流数据处理结果进行并集操作。
附图说明
图1为本申请实施例的数据处理方法流程图一;
图2为本申请实施例的数据处理方法流程图二;
图3为本申请实施例的数据处理方法流程图三;
图4为本申请实施例的流ETL模型示意图;
图5为本申请实施例的路口监控实时业务系统示意图;
图6为本申请实施例的数据处理方法流程图四;
图7为本申请实施例的数据处理装置组成结构示意图一;
图8为本申请实施例的数据处理装置组成结构示意图二;
图9为本申请实施例的数据处理装置组成结构示意图三;
图10为本申请实施例的数据处理设备组成结构示意图四。
具体实施方式
为更进一步阐述本申请实施例为达成预定目的所采取的技术手段及功效,以下结合附图及较佳实施例,对本申请实施例进行详细说明如后。
相关技术中的StreamCQL(Stream Continuous Query Language,流处理平台SQL引擎)在接收到包含并集(union all)操作的流式数据处理任务时,将出现任务错误并拒绝执行并集操作,需要将每一流数据处理结果分别插入至输出数据流,执行效率低,处理速度缓慢。
本申请实施例提供的数据处理方法,应用于StreamCQL,用于实现上述StreamCQL对多个流数据处理结果的并集操作,以解决现有技术StreamCQL无法执行并集操作导致执行效率低,处理速度缓慢的问题。
本申请实施例提供了一种数据处理方法,如图1所示,包括以下步骤:
步骤S101,在识别到流式数据处理任务中包含并集操作标识的情况下,对流式数据处理任务中包含并集操作标识的第一数据处理指令进行解析,得到可识别的第二数据处理指令。
可选的,操作标识包括union all算子。
在本实施例中,对识别到流式数据处理任务中是否包含并集操作标识的方式不做具体限定,可以是基于一个或多个预置并集操作标识,判断流 式数据处理任务中是否存在与该预置并集操作标识匹配的标识,若流式数据处理任务中存在与该预置并集操作标识匹配的标识,则判定流式数据处理任务中包含并集操作标识,否则判定流式数据处理任务中不包含并集操作标识。
可选的,对流式数据处理任务中包含并集操作标识的第一数据处理指令进行解析,得到可识别的第二数据处理指令的方式包括但不限于:对流式数据处理任务中包含并集操作标识的第一数据处理指令进行解析,得到可识别的用于执行并集操作的第二数据处理指令。
在识别到流式数据处理任务中包含并集操作标识的情况下,将无法执行的并集操作数据处理指令,转换为可识别的并集操作数据处理指令,避免了StreamCQL接收到并集操作的数据处理指令时出现任务错误并拒绝执行并集操作的弊端。
例如,步骤S101,包括:在识别到流式数据处理任务中包含union all算子的情况下,对流式数据处理任务中包含union all算子的第一SQL数据处理指令进行解析,得到可识别的用于执行union all操作的第二SQL数据处理指令。
步骤S102,通过执行第二数据处理指令,以对多个流数据处理结果进行并集操作。
通过执行第二数据处理指令,以对多个流数据处理结果进行并集操作,实现了StreamCQL对并集操作的支持,有效提高了流数据处理执行效率及处理速度。
本申请第一实施例所述的数据处理方法,实现了流式ETL数据系统对多个流数据处理结果进行并集操作。
本申请实施例还提供了一种数据处理方法,如图2所示,包括以下具体步骤:
步骤S201,在接收到流式数据处理任务的情况下,基于预置的并集操作标识字符匹配模板,对流式数据处理任务中的所有数据处理指令进行并集操作标识字符匹配识别,以识别流式数据处理任务中是否包含并集操作标识。
在本实施例中,对预置的并集操作标识字符匹配模板不做具体限定,可以是工程师根据工程经验设置的并集操作标识字符匹配模板。
可选的,操作标识包括union all算子。
通过预置的并集操作标识字符匹配模板,对流式数据处理任务中的所有数据处理指令进行并集操作标识字符匹配识别,实现了对并集操作标识的有效识别,避免了现有技术中StreamCQL无法对并集操作标识识别的弊端。
步骤S202,对第一数据处理指令的正则表达式进行词法分析和语法分析,得到可识别的数据处理指令文件。
例如:步骤S202,包括:对第一SQL数据处理指令的正则表达式进行词法分析和语法分析,得到可识别的SQL数据处理指令文件。
步骤S203,通过解析数据处理指令文件,得到并集操作的语法树对象。
在识别到流式数据处理任务中包含并集操作标识的情况下,将无法执行的并集操作数据处理指令,转换为可识别的并集操作的语法树对象,避免了StreamCQL接收到并集操作的数据处理指令时出现任务错误并拒绝执行并集操作的弊端。
例如,步骤S203,包括:通过解析SQL数据处理指令文件,得到并集操作的语法树对象。
步骤S204,根据并集操作的语法树对象,对多个流数据处理结果执行并集操作。
根据并集操作的语法树对象,对多个流数据处理结果执行并集操作, 实现了StreamCQL对并集操作的支持,有效提高了流数据处理执行效率及处理速度。
本申请实施例还提供了一种数据处理方法,如图3所示,包括以下具体步骤:
步骤S301,在接收到流式数据处理任务的情况下,基于预置的并集操作标识字符匹配模板,对流式数据处理任务中的所有数据处理指令进行并集操作标识字符匹配识别,以识别流式数据处理任务中是否包含并集操作标识。
在本实施例中,对预置的并集操作标识字符匹配模板不做具体限定,可以是工程师根据工程经验设置的并集操作标识字符匹配模板。
可选的,操作标识包括union all算子。
通过预置的并集操作标识字符匹配模板,对流式数据处理任务中的所有数据处理指令进行并集操作标识字符匹配识别,实现了对并集操作标识的有效识别,避免了现有技术中StreamCQL无法对并集操作标识识别的弊端。
步骤S302,在识别到流式数据处理任务中包含并集操作标识的情况下,对流式数据处理任务中包含并集操作标识的第一数据处理指令进行解析,得到可识别的第二数据处理指令。
可选的,可识别的第二数据处理指令包括并集操作的语法树对象;对流式数据处理任务中包含并集操作标识的第一数据处理指令进行解析,得到可识别的第二数据处理指令的方式,包括:
对第一数据处理指令的正则表达式进行词法分析和语法分析,得到可识别的数据处理指令文件;通过解析数据处理指令文件,得到并集操作的语法树对象。
在识别到流式数据处理任务中包含并集操作标识的情况下,将无法识 别的并集操作数据处理指令,转换为可识别的第二数据处理指令,避免了StreamCQL接收到并集操作的数据处理指令时出现任务错误并拒绝执行并集操作的弊端。
步骤S303,将接收到的流式数据处理任务分解为多个子流式数据处理任务;并行执行所有独立的子流式数据处理任务,并顺序执行存在依赖关系的子流式数据处理任务,以得到多个流数据处理结果。
通过将接收到的流式数据处理任务分解为多个子流式数据处理任务;并行执行所有独立的子流式数据处理任务,并顺序执行存在依赖关系的子流式数据处理任务,极大地提高了流式数据处理任务的处理效率。
步骤S304,通过执行第二数据处理指令,以对多个流数据处理结果进行并集操作。
可选的,步骤S304,包括:在第一数据处理指令的数量为多个的情况下,通过多个线程并行执行每一第一数据处理指令对应的第二数据处理指令,以对多个流数据处理结果进行并集操作,得到流数据并集处理结果;将流数据并集处理结果插入至输出数据流。
通过多个线程并行执行每一第二数据处理指令,对多个线程并行对流数据处理结果执行并集操作,实现了StreamCQL对并集操作的支持同时,极大提高了流数据处理执行效率及处理速度。
可选的,本实施例所述的数据处理方法在步骤S304之后,还包括:
步骤S305,将输出数据流推送存储至终端数据存储器,以供终端数据存储器提供实时系统输出或实时数据报表输出。
实现了快速获取输出数据流的目的;并采用任务分解的方法,实现了子任务及并集操作并行执行;实现了流ETL快速执行、多流快速高效地合并的目的,降低流处理延迟性,提高实时流处理的实时性能力。
本申请实施例所述的数据处理方法,实现了流式ETL数据系统对多个 流数据处理结果进行并集操作。
下面在上述实施例的基础上,结合具体的数据处理方法示例,并结合附图4~6对本申请的应用实例进行详细阐述。
通过安装在各个路口和卡口的实时监控摄像头实时采集路口车流和人流的原始视频流数据,将原始视频流数据输入非结构化接收子系统进行数据暂存后,再输入流ETL模型作为实时输入流,实时输入流作为流ETL模型的输入数据源用于在流计算中根据流式任务进行数据治理。
其中,采集到的原始视频流数据包括:S 1,S 2,…,S n(n为实时监控摄像头的数量)。
如图4所示,实时输入流包括:原始视频数据S 1,S 2,…,S n和原始视频数据对应的采集时间流t 1,t 2,…,t n;流ETL任务以StreamCQL形式进行流ETL任务提交;路口和卡口的车流和人流监控预警系统的目的是通过对原始视频流数据进行数据治理,对人流和车流数进行实时统计并预警,需要对流S 1,S 2,…,S n与流t 1,t 2,…,t n采取的处理包括并集操作,StreamCQL中包含图5中流ETL实时数据处理子系统对流stream的数据治理,并打破了StreamCQL不支持union all操作的壁垒;在流数据治理过程中进行任务优化,提高流处理速度、减少流处理时间。
本实施例所述的数据处理方法,如图6所示,包括以下具体步骤:
步骤S401,语法识别模块识别提交的流式数据处理任务中是否含有union all算子,当提交的流式数据处理任务中的SQL语句中包含有union all算子时,通过字符匹配模板对特定地词法进行识别,并根据词法匹配结果对流式数据处理任务中含有union all的文本语句进行语句分词以获取识别到包含union all的SQL语句,并将SQL语句正则表达式传送至语法解析模块。
步骤S402,语法解析模块包块语法分析器和语法解析器,通过语法分 析器对输入的SQL正则表达式进行词法分析和语法分析,生成流ETL中的语法解析器能够识别的SQL文件;通过语法解析器解析含有union all语法的SQL文件,并生成union all的语法树对象。
步骤S403,通过算子操作模块对输入流数据S 1,S 2,…,S n的进行并集操作。
在本实施例中,由于输入流数据是非结构化数据,需对非结构化数据进行转换为所需的数据形式,需要将非结构化数据流转换为结构化数据,并对结构化数据进行特定字段抽取作为子数据流m 1,m 2,…,m n,抽取到子数据流m 1,m 2,…,m n后,子数据流m 1,m 2,…,m n与时间流t 1,t 2,…,t n直接进行快速并集操作,实现StreamCQL对union all的支持。
步骤S404,通过执行优化模块对提交的流式数据处理任务进行语句切分,分解成多个流式数据处理任务子任务,每个流式数据处理任务子任务并行执行,以得到流式任务处理中间结果,通过执行优化模块使用多线程方式对每个包含union all的流式数据处理任务子任务进行并行执行,将多输入中间流m 1,m 2,…,m n并行插入输出数据流,并将结果汇总到输出流产生实时结果数据。
如图5所示的实时监控摄像头可用于采集并获得原始视频数据以及原始视频数据对应的采集时间流。非结构化数据接收子系统可用于接收实时监控摄像头获得的原始视频数据以及原始视频数据对应的采集时间流。
如图5所示的信息业务记录子系统根据输出数据流记录T i时刻到T j时刻经过某个路口或卡口的车流和人流数;
如图5所示的业务服务子系统将T i~T j时刻经过某个路口或卡口的车流和人流数实时显示,进行道路车流和人流实时播报,及车流和人流预警,进行道路分流等。
与现有技术相比,传统流ETL在多流合并时,需要对每个数据流做一 次插入数据源的操作,计算及存储过程慢,本申请实施例在现有技术基础上,解决StreamCQL不支持union all算子的阻碍,通过实现StreamCQL支持union all算子的方法,并采用union all并行实现的方式,实现多流快速高效地合并,提高实时流处理的能力。
本申请实施例提供的数据处理装置,设置于StreamCQL,用于实现上述StreamCQL对多个流数据处理结果的并集操作,以解决现有技术StreamCQL无法执行并集操作导致执行效率低,处理速度缓慢的问题。
本申请实施例还提供了一种数据处理装置,如图7所示,包括以下组成部分:
解析模块11,配置为在识别到流式数据处理任务中包含并集操作标识的情况下,对流式数据处理任务中包含并集操作标识的第一数据处理指令进行解析,得到可识别的第二数据处理指令。
可选的,操作标识包括union all算子。
在本实施例中,所述解析模块11对识别到流式数据处理任务中是否包含并集操作标识的方式不做具体限定,可以是基于一个或多个预置并集操作标识,判断流式数据处理任务中是否存在与该预置并集操作标识匹配的标识,若流式数据处理任务中存在与该预置并集操作标识匹配的标识,则判定流式数据处理任务中包含并集操作标识,否则流式数据处理任务中不包含并集操作标识。
可选的,所述解析模块11对流式数据处理任务中包含并集操作标识的第一数据处理指令进行解析,得到可识别的第二数据处理指令的方式包括但不限于:对流式数据处理任务中包含并集操作标识的第一数据处理指令进行解析,得到可识别的用于执行并集操作的第二数据处理指令。
所述解析模块11在识别到流式数据处理任务中包含并集操作标识的情况下,将无法执行的并集操作数据处理指令,转换为可识别的并集操作数 据处理指令,避免了StreamCQL接收到并集操作的数据处理指令时出现任务错误并拒绝执行并集操作的弊端。
作为一种实施方式,所述解析模块11,配置为:在识别到流式数据处理任务中包含union all算子的情况下,对流式数据处理任务中包含union all算子的第一SQL数据处理指令进行解析,得到可识别的配置为执行union all操作的第二SQL数据处理指令。
操作模块12,配置为通过执行第二数据处理指令,以对多个流数据处理结果进行并集操作。
所述操作模块12通过执行第二数据处理指令,以对多个流数据处理结果进行并集操作,实现了StreamCQL对并集操作的支持,有效提高了流数据处理执行效率及处理速度。
本申请实施例还提供了一种数据处理装置,如图8所示,包括以下组成部分:
识别模块21,配置为在所述对所述流式数据处理任务中包含所述并集操作标识的第一数据处理指令进行解析之前,在接收到流式数据处理任务的情况下,基于预置的并集操作标识字符匹配模板,对流式数据处理任务中的所有数据处理指令进行并集操作标识字符匹配识别,以识别流式数据处理任务中是否包含并集操作标识。
在本实施例中,对预置的并集操作标识字符匹配模板不做具体限定,可以是工程师根据工程经验设置的并集操作标识字符匹配模板。
可选的,操作标识包括union all算子。
所述识别模块21通过预置的并集操作标识字符匹配模板,对流式数据处理任务中的所有数据处理指令进行并集操作标识字符匹配识别,实现了对并集操作标识的有效识别,避免了现有技术中StreamCQL无法对并集操作标识识别的弊端。
本申请实施例中,所述可识别的第二数据处理指令包括并集操作的语法树对象;解析模块22,配置为对第一数据处理指令的正则表达式进行词法分析和语法分析,得到可识别的数据处理指令文件;通过解析数据处理指令文件,得到并集操作的语法树对象。
在识别到流式数据处理任务中包含并集操作标识的情况下,将无法执行的并集操作数据处理指令,转换为可识别的并集操作的语法树对象,避免了StreamCQL接收到并集操作的数据处理指令时出现任务错误并拒绝执行并集操作的弊端。
操作模块23,配置为根据并集操作的语法树对象,对多个流数据处理结果执行并集操作。
本申请实施例中,所述操作模块23根据并集操作的语法树对象,对多个流数据处理结果执行并集操作,实现了StreamCQL对并集操作的支持,有效提高了流数据处理执行效率及处理速度。
本申请实施例还提供了一种数据处理装置,如图9所示,包括以下组成部分:
识别模块31,配置为在接收到流式数据处理任务的情况下,基于预置的并集操作标识字符匹配模板,对流式数据处理任务中的所有数据处理指令进行并集操作标识字符匹配识别,以识别流式数据处理任务中是否包含并集操作标识。
在本实施例中,对预置的并集操作标识字符匹配模板不做具体限定,可以是工程师根据工程经验设置的并集操作标识字符匹配模板。
可选的,操作标识包括union all算子。
本实施例中,所述识别模块31通过预置的并集操作标识字符匹配模板,对流式数据处理任务中的所有数据处理指令进行并集操作标识字符匹配识别,实现了对并集操作标识的有效识别,避免了现有技术中StreamCQL无 法对并集操作标识识别的弊端。
解析模块32,配置为在识别到流式数据处理任务中包含并集操作标识的情况下,对流式数据处理任务中包含并集操作标识的第一数据处理指令进行解析,得到可识别的第二数据处理指令。
可选的,可识别的第二数据处理指令包括并集操作的语法树对象;对流式数据处理任务中包含并集操作标识的第一数据处理指令进行解析,得到可识别的第二数据处理指令的方式,包括:对第一数据处理指令的正则表达式进行词法分析和语法分析,得到可识别的数据处理指令文件;通过解析数据处理指令文件,得到并集操作的语法树对象。
可选的,解析模块32,配置为:在识别到流式数据处理任务中包含并集操作标识的情况下,对第一数据处理指令的正则表达式进行词法分析和语法分析,得到可识别的数据处理指令文件;通过解析数据处理指令文件,得到并集操作的语法树对象。
在识别到流式数据处理任务中包含并集操作标识的情况下,将无法识别的并集操作数据处理指令,转换为可识别的第二数据处理指令,避免了StreamCQL接收到并集操作的数据处理指令时出现任务错误并拒绝执行并集操作的弊端。
并行处理模块33,配置为将接收到的流式数据处理任务分解为多个子流式数据处理任务;并行执行所有独立的子流式数据处理任务,并顺序执行存在依赖关系的子流式数据处理任务,以得到多个流数据处理结果。
通过将接收到的流式数据处理任务分解为多个子流式数据处理任务;并行执行所有独立的子流式数据处理任务,并顺序执行存在依赖关系的子流式数据处理任务,极大地提高了流式数据处理任务的处理效率。
操作模块34,配置为通过执行第二数据处理指令,以对多个流数据处理结果进行并集操作。
可选的,操作模块34,配置为:在第一数据处理指令的数量为多个的情况下,通过多个线程并行执行每一第一数据处理指令对应的第二数据处理指令,以对多个流数据处理结果进行并集操作,得到流数据并集处理结果;将流数据并集处理结果插入至输出数据流。
通过多个线程并行执行每一第二数据处理指令,对多个线程并行对流数据处理结果执行并集操作,实现了StreamCQL对并集操作的支持同时,极大提高了流数据处理执行效率及处理速度。
可选的,本实施例所述的数据处理装置,还包括:
实时存储输出模块35,配置为将输出数据流推送存储至终端数据存储器,以供终端数据存储器提供实时系统输出或实时数据报表输出。
实现了快速获取输出数据流的目的;并采用任务分解的方法,实现了子任务及并集操作并行执行;实现了流ETL快速执行、多流快速高效地合并的目的,降低流处理延迟性,提高实时流处理的实时性能力。
本发明实施例中,所述数据处理装置中的解析模块11、操作模块12、识别模块21、解析模块22、操作模块23、识别模块31、解析模块32、并行处理模块33和操作模块34和实时存储输出模块35,在实际应用中均可由中央处理器(CPU,Central Processing Unit)、数字信号处理器(DSP,Digital Signal Processor)、微控制单元(MCU,Microcontroller Unit)或可编程门阵列(FPGA,Field-Programmable Gate Array)实现;所述装置中的实时存储输出模块35,在实际应用中可通过通信模组(包含:基础通信套件、操作系统、通信模块、标准化接口和协议等)及收发天线实现。
需要说明的是:上述实施例提供的数据处理装置在进行数据处理时,仅以上述各程序模块的划分进行举例说明,实际应用中,可以根据需要而将上述处理分配由不同的程序模块完成,即将装置的内部结构划分成不同的程序模块,以完成以上描述的全部或者部分处理。另外,上述实施例提 供的数据处理装置与数据处理方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。
本申请实施例还提供了一种数据处理设备,如图10所示,包括以下组成部分:处理器501和存储器502。在本申请的一些实施例中,处理器501和存储器502可通过总线或者其它方式连接。
可选地,处理器501可以是通用处理器,例如中央处理器(Central Processing Unit,CPU),还可以是数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(英文:Application Specific Integrated Circuit,ASIC),或者是被配置成实施本申请实施例的一个或多个集成电路。其中,存储器502用于存储处理器501的可执行指令;
可选地,存储器502,用于存储程序代码,并将该程序代码传输给处理器501。存储器502可以包括易失性存储器(Volatile Memory),例如随机存取存储器(Random Access Memory,RAM);存储器502也可以包括非易失性存储器(Non-Volatile Memory),例如只读存储器(Read-Only Memory,ROM)、快闪存储器(Flash Memory)、硬盘(Hard Disk Drive,HDD)或固态硬盘(Solid-State Drive,SSD);存储器502还可以包括上述种类的存储器的组合。
其中,处理器501用于调用存储器502存储的程序代码管理代码,执行本申请实施例部分或全部步骤。
本申请实施例还提供了一种计算机可读存储介质。计算机存储介质可以是RAM存储器、闪存、ROM存储器、EPROM存储器、EEPROM存储器、寄存器、硬盘、移动硬盘、CD-ROM或者本领域已知的任何其他形式的存储介质。
计算机可读存储介质存储有一个或者多个程序,该一个或者多个程序可被一个或者多个处理器执行,以实现本申请实施例中部分或全部步骤。
本申请第九实施例所述的计算机可读存储介质,存储有一个或者多个程序,该一个或者多个程序可被一个或者多个处理器执行,能够实现流式ETL数据系统对多个流数据处理结果进行并集操作。
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素。
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本申请各个实施例所述的方法。
本申请所提供的几个方法实施例中所揭露的方法,在不冲突的情况下可以任意组合,得到新的方法实施例。
本申请所提供的几个产品实施例中所揭露的特征,在不冲突的情况下可以任意组合,得到新的产品实施例。
本申请所提供的几个方法或设备实施例中所揭露的特征,在不冲突的情况下可以任意组合,得到新的方法实施例或设备实施例。
上面结合附图对本申请的实施例进行了描述,但是本申请并不局限于上述的具体实施方式,上述的具体实施方式仅仅是示意性的,而不是限制 性的,本领域的普通技术人员在本申请的启示下,在不脱离本申请宗旨和权利要求所保护的范围情况下,还可做出很多形式,这些均属于本申请的保护之内。

Claims (14)

  1. 一种数据处理方法,包括:
    在识别到流式数据处理任务中包含并集操作标识的情况下,对所述流式数据处理任务中包含所述并集操作标识的第一数据处理指令进行解析,得到可识别的第二数据处理指令;
    通过执行所述第二数据处理指令,以对多个流数据处理结果进行并集操作。
  2. 根据权利要求1所述的方法,其中,在所述对所述流式数据处理任务中包含所述并集操作标识的第一数据处理指令进行解析之前,所述方法还包括:
    在接收到所述流式数据处理任务的情况下,基于预置的并集操作标识字符匹配模板,对所述流式数据处理任务中的所有数据处理指令进行并集操作标识字符匹配识别,以识别所述流式数据处理任务中是否包含并集操作标识。
  3. 根据权利要求1或2所述的方法,其中,所述可识别的第二数据处理指令包括并集操作的语法树对象;
    所述对所述流式数据处理任务中包含所述并集操作标识的第一数据处理指令进行解析,得到可识别的第二数据处理指令,包括:
    对所述第一数据处理指令的正则表达式进行词法分析和语法分析,得到可识别的数据处理指令文件;
    通过解析所述数据处理指令文件,得到所述并集操作的语法树对象。
  4. 根据权利要求3所述的方法,其中,所述通过执行所述第二数据处理指令,以对多个流数据处理结果进行并集操作,包括:
    根据所述并集操作的语法树对象,对多个流数据处理结果执行并集操作。
  5. 根据权利要求1所述的方法,其中,在所述通过执行所述第二数据处理指令,以对多个流数据处理结果进行并集操作之前,所述方法还包括:
    将接收到的所述流式数据处理任务分解为多个子流式数据处理任务;
    并行执行所有独立的子流式数据处理任务,并顺序执行存在依赖关系的子流式数据处理任务,以得到多个所述流数据处理结果。
  6. 根据权利要求5所述的方法,其中,所述第一数据处理指令的数量为多个;
    所述通过执行所述第二数据处理指令,以对多个流数据处理结果进行并集操作,包括:
    通过多个线程并行执行每一所述第一数据处理指令对应的第二数据处理指令,以对多个所述流数据处理结果进行并集操作,得到流数据并集处理结果;
    将所述流数据并集处理结果插入至输出数据流。
  7. 一种数据处理装置,包括:
    解析模块,配置为在识别到流式数据处理任务中包含并集操作标识的情况下,对所述流式数据处理任务中包含所述并集操作标识的第一数据处理指令进行解析,得到可识别的第二数据处理指令;
    操作模块,配置为通过执行所述第二数据处理指令,以对多个流数据处理结果进行并集操作。
  8. 根据权利要求7所述的装置,其中,所述装置还包括:
    识别模块,配置为在所述对所述流式数据处理任务中包含所述并集操作标识的第一数据处理指令进行解析之前,在接收到所述流式数据处理任务的情况下,基于预置的并集操作标识字符匹配模板,对所述流式数据处理任务中的所有数据处理指令进行并集操作标识字符匹配识别,以识别所述流式数据处理任务中是否包含并集操作标识。
  9. 根据权利要求7或8所述的装置,其中,所述可识别的第二数据处理指令包括并集操作的语法树对象;
    解析模块,配置为:对所述第一数据处理指令的正则表达式进行词法分析和语法分析,得到可识别的数据处理指令文件;通过解析所述数据处理指令文件,得到所述并集操作的语法树对象。
  10. 根据权利要求9所述的装置,其中,所述操作模块,配置为:根据所述并集操作的语法树对象,对多个流数据处理结果执行并集操作。
  11. 根据权利要求7所述的装置,其中,所述装置还包括:
    并行处理模块,配置为在所述通过执行所述第二数据处理指令,以对多个流数据处理结果进行并集操作之前,将接收到的所述流式数据处理任务分解为多个子流式数据处理任务;并行执行所有独立的子流式数据处理任务,并顺序执行存在依赖关系的子流式数据处理任务,以得到多个所述流数据处理结果。
  12. 根据权利要求11所述的方法,其中,所述第一数据处理指令的数量为多个;
    所述操作模块,配置为:通过多个线程并行执行每一所述第一数据处理指令对应的第二数据处理指令,以对多个所述流数据处理结果进行并集操作,得到流数据并集处理结果;将所述流数据并集处理结果插入至输出数据流。
  13. 一种数据处理设备,其中,所述数据处理设备包括处理器和存储器;所述处理器用于执行存储器中存储的数据处理程序,以实现根据权利要求1~6中任一项所述的数据处理方法的步骤。
  14. 一种计算机可读存储介质,其中,所述计算机可读存储介质存储有一个或者多个程序,所述一个或者多个程序可被一个或者多个处理器执行,以实现根据权利要求1~6中任一项所述的数据处理方法的步骤。
PCT/CN2019/088974 2018-06-21 2019-05-29 数据处理方法、装置、设备及计算机可读存储介质 WO2019242470A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810645397.8 2018-06-21
CN201810645397.8A CN110704551B (zh) 2018-06-21 2018-06-21 数据处理方法、装置、设备及计算机可读存储介质

Publications (1)

Publication Number Publication Date
WO2019242470A1 true WO2019242470A1 (zh) 2019-12-26

Family

ID=68983120

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/088974 WO2019242470A1 (zh) 2018-06-21 2019-05-29 数据处理方法、装置、设备及计算机可读存储介质

Country Status (2)

Country Link
CN (1) CN110704551B (zh)
WO (1) WO2019242470A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112181704A (zh) * 2020-09-28 2021-01-05 京东数字科技控股股份有限公司 一种大数据任务处理方法、装置、电子设备及存储介质
CN116881310A (zh) * 2023-09-07 2023-10-13 卓望数码技术(深圳)有限公司 一种大数据的集合计算方法及装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101976316A (zh) * 2010-10-27 2011-02-16 杭州新中大软件股份有限公司 一种信息访问权限控制方法
US20150227608A1 (en) * 2013-10-06 2015-08-13 Yahoo! Inc. System and method for performing set operations with defined sketch accuracy distribution
CN105512022A (zh) * 2014-09-25 2016-04-20 华为技术有限公司 一种数据处理方法和设备
CN105512162A (zh) * 2015-09-28 2016-04-20 杭州圆橙科技有限公司 一种基于Storm的流数据实时智能化处理框架
CN107861981A (zh) * 2017-09-28 2018-03-30 北京奇艺世纪科技有限公司 一种数据处理方法及装置

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10515118B2 (en) * 2013-06-24 2019-12-24 Micro Focus Llc Processing a data flow graph of a hybrid flow
US9934279B2 (en) * 2013-12-05 2018-04-03 Oracle International Corporation Pattern matching across multiple input data streams
CN104216766B (zh) * 2014-08-26 2017-08-29 华为技术有限公司 对流数据进行处理的方法及装置
CN106610999A (zh) * 2015-10-26 2017-05-03 北大方正集团有限公司 查询处理方法和装置
CN107787010A (zh) * 2016-08-26 2018-03-09 电信科学技术研究院 一种数据流传输方法、汇聚节点、基站和ue
CN106599182B (zh) * 2016-12-13 2019-10-11 飞狐信息技术(天津)有限公司 基于spark streaming实时流的特征工程推荐方法及装置、视频网站
CN106713944A (zh) * 2016-12-30 2017-05-24 北京奇虎科技有限公司 一种流数据任务的处理方法和装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101976316A (zh) * 2010-10-27 2011-02-16 杭州新中大软件股份有限公司 一种信息访问权限控制方法
US20150227608A1 (en) * 2013-10-06 2015-08-13 Yahoo! Inc. System and method for performing set operations with defined sketch accuracy distribution
CN105512022A (zh) * 2014-09-25 2016-04-20 华为技术有限公司 一种数据处理方法和设备
CN105512162A (zh) * 2015-09-28 2016-04-20 杭州圆橙科技有限公司 一种基于Storm的流数据实时智能化处理框架
CN107861981A (zh) * 2017-09-28 2018-03-30 北京奇艺世纪科技有限公司 一种数据处理方法及装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ZHANG, XIAOPENG: "SreamCQL Architecture Parsing, Open Source Stream Processing Framework from Huawei", CSDN, 22 December 2015 (2015-12-22) *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112181704A (zh) * 2020-09-28 2021-01-05 京东数字科技控股股份有限公司 一种大数据任务处理方法、装置、电子设备及存储介质
CN116881310A (zh) * 2023-09-07 2023-10-13 卓望数码技术(深圳)有限公司 一种大数据的集合计算方法及装置
CN116881310B (zh) * 2023-09-07 2023-11-14 卓望数码技术(深圳)有限公司 一种大数据的集合计算方法及装置

Also Published As

Publication number Publication date
CN110704551B (zh) 2023-02-17
CN110704551A (zh) 2020-01-17

Similar Documents

Publication Publication Date Title
CN107147639B (zh) 一种基于复杂事件处理的实时安全预警方法
TWI524206B (zh) 提供程式解析驗證服務系統及其控制方法、控制程式、使電腦發揮作用之控制程式、程式解析驗證裝置、程式解析驗證工具管理裝置
JP6205066B2 (ja) ストリームデータ処理方法、ストリームデータ処理装置及び記憶媒体
WO2019242470A1 (zh) 数据处理方法、装置、设备及计算机可读存储介质
CN109670091B (zh) 一种基于数据标准的元数据智能维护方法和装置
CN110209700B (zh) 一种数据流关联方法、装置、电子设备及存储介质
CN110515944B (zh) 基于分布式数据库的数据存储方法、存储介质和电子设备
CN108153803B (zh) 一种数据获取方法、装置及电子设备
CN106971007B (zh) 一种利用数据结构控制的数据处理与数据分析框架
CN103345386A (zh) 一种软件生产方法、装置及运行系统
CN108694221A (zh) 数据实时分析方法、模块、设备和装置
CN111079408A (zh) 一种语种识别方法、装置、设备及存储介质
CN103810203A (zh) 一种数据库管理系统连接复用方法及装置
CN116016628A (zh) 一种api网关埋点分析方法及装置
WO2023179319A1 (zh) 报警方法及装置
CN110109672B (zh) 一种表达式的解析处理方法及装置
CN111209750A (zh) 车联网威胁情报建模方法、装置及可读存储介质
KR101482668B1 (ko) 변전소 구성 언어 기반의 데이터베이스 생성 방법 및 시스템
CN106598721B (zh) 媒资数据流转方法及装置
CN112671845B (zh) 数据处理方法、装置、电子设备、存储介质及云端系统
CN107426028A (zh) Waf引擎的架构及设计方法
CN114168557A (zh) 一种访问日志的处理方法、装置、计算机设备和存储介质
KR101556541B1 (ko) 고부하 경로 기반의 복합 이벤트 처리 장치 및 그 방법
JP2003132039A (ja) シナリオ分割方式
Akram et al. Anomaly detection of manufacturing equipment via high performance rdf data stream processing: Grand challenge

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19822323

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 17.05.2021)

122 Ep: pct application non-entry in european phase

Ref document number: 19822323

Country of ref document: EP

Kind code of ref document: A1