CN116302219A - Data stream batch processing method, device, equipment and medium - Google Patents

Data stream batch processing method, device, equipment and medium Download PDF

Info

Publication number
CN116302219A
CN116302219A CN202310268071.9A CN202310268071A CN116302219A CN 116302219 A CN116302219 A CN 116302219A CN 202310268071 A CN202310268071 A CN 202310268071A CN 116302219 A CN116302219 A CN 116302219A
Authority
CN
China
Prior art keywords
class
rule
mapping
data source
relation mapping
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310268071.9A
Other languages
Chinese (zh)
Inventor
王锟
陈升
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
21VIANET GROUP Inc
Original Assignee
21VIANET GROUP Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 21VIANET GROUP Inc filed Critical 21VIANET GROUP Inc
Priority to CN202310268071.9A priority Critical patent/CN116302219A/en
Publication of CN116302219A publication Critical patent/CN116302219A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/448Execution paradigms, e.g. implementations of programming paradigms
    • G06F9/4488Object-oriented
    • G06F9/449Object-oriented method invocation or resolution
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present disclosure relates to the field of data processing technologies, and in particular, to a method, an apparatus, a device, and a medium for batch processing of a data stream. The method comprises the following steps: mapping the input data source table into corresponding object relation mapping classes based on the object relation mapping rules; after performing attribute assignment operation on the object relation mapping class, weaving the object relation mapping class into an operation method of data stream batch processing to obtain a rule proxy class of the object-oriented programming language, which is provided with the operation method; and bridging the rule proxy class to meet the call class of the conversion service interface to obtain a bridging call class, so that the preset conversion service interface calls the bridging call class to perform conversion processing on the data in the data source table. Therefore, related entity classes and attributes can be directly defined according to service requirements without defining SQL script sentences, and the structure of the data source table is described through the object relation mapping class, so that the abstraction of the data source table is realized, and the conversion processing of the data in the data source table is realized.

Description

Data stream batch processing method, device, equipment and medium
Technical Field
The present disclosure relates to the field of data processing technologies, and in particular, to a method, an apparatus, a device, and a medium for batch processing of a data stream.
Background
The Flink is a distributed and high-performance stream processing application program framework, which not only supports high throughput and real-time computation of exact-once semantics, but also supports batch processing realized by combining a data definition window based on a stream computing mode. The Flink uses a stateful flow computation engine implemented by a state mechanism (state). The flank stream computation uses intermediate computation results of state storage nodes, and fault-tolerant recovery can also be achieved through a checkpoint (checkpoint) mechanism.
The basis of the Flink real-time calculation model is: the basic flow implementation of Stream and transform is: each data stream starts from one or more data sources (source), passes through several conversion processes (transformations), and ends at the output of the calculation results (Sink).
The flank SQL (Structured Query Language ) is a simplified flank real-time computing model that uses a development language that conforms to standard SQL semantics to achieve the acquisition of raw data sources (sources), the application of computational logic (transform) to compute summaries, and the output of results (Sink) to specified targets.
However, the Flink SQL is written by using SQL semantics, the SQL writing requires a certain database query and operation experience, and a non-database developer or a business program developer cannot directly define the required SQL script statement according to business requirements, which is not beneficial to business realization.
Disclosure of Invention
The application provides a data stream batch processing method, a device, equipment and a medium, which are used for realizing abstraction of a data source table according to service requirements under the condition that SQL script sentences are not required to be defined so as to realize conversion processing of data in the data source table.
The specific technical scheme provided by the embodiment of the application is as follows:
in a first aspect, an embodiment of the present application provides a data stream batch processing method, including:
mapping an input data source table into a corresponding object relation mapping class based on an object relation mapping rule, wherein the object relation mapping class is defined by an object-oriented programming language;
after performing attribute assignment operation on the object relation mapping class, weaving into an operation method of data stream batch processing to obtain a rule proxy class of the object-oriented programming language with the operation method;
and bridging the rule proxy class to the call class meeting the conversion service interface to obtain a bridging call class, so that the preset conversion service interface calls the bridging call class to perform conversion processing on the data in the data source table.
In an alternative embodiment, the mapping the input data source table into the corresponding object relation mapping class based on the object relation mapping rule includes:
Based on an object relation mapping rule, determining a class name corresponding to a table name of an input data source table and a class attribute name corresponding to a table field name of the data source;
mapping and associating the table names with the corresponding class names and the table field names with the corresponding class attribute names respectively in an annotation mode.
In an optional implementation manner, after performing attribute assignment operation on the object relation mapping class, the operation method of batch processing of the data stream is woven, and a rule proxy class of the object-oriented programming language with the operation method is obtained, which includes:
using a reflection mechanism of an object-oriented programming language to carry out instantiation operation on the object relation mapping class, and carrying out attribute assignment operation on the instantiated object relation mapping class;
and an operation method of data stream batch processing is woven into the object relation mapping class after attribute assignment by a dynamic proxy mode, and a rule proxy class with the operation method of the object-oriented programming language is obtained.
In an optional implementation manner, bridging the rule proxy class to the call class meeting the preset conversion service to obtain a bridged call class includes:
adopting a bridging mode, and adding a related construction method, an assembly method and a destructor which meet the conversion call of the preset conversion service on the basis of the operation method of the rule proxy class;
And obtaining the bridging call class based on the rule proxy class added with the related construction method, the assembly method and the destructor.
In an alternative embodiment, the method further comprises:
if the input data source table does not meet the preset condition, adding annotation information to the data source table in an annotation mode, wherein the annotation information is used for indicating that SQL query sentences are adopted for expression;
and inputting the data source table added with the labeling information into an SQL analysis engine for conversion processing.
In a second aspect, an embodiment of the present application provides a data stream batch processing apparatus, including:
the mapping module is used for mapping the input data source table into corresponding object relation mapping classes based on the object relation mapping rule, wherein the object relation mapping classes are classes defined by an object-oriented programming language;
the rule agent module is used for weaving the attribute assignment operation of the object relation mapping class into an operation method of data stream batch processing to obtain a rule agent class of the object-oriented programming language, which is provided with the operation method;
and the bridging construction module is used for bridging the rule proxy class to the call class meeting the conversion service interface to obtain a bridging call class so that the preset conversion service interface calls the bridging call class to perform conversion processing on the data in the data source table.
In an alternative embodiment, the mapping module is specifically configured to:
based on an object relation mapping rule, determining a class name corresponding to a table name of an input data source table and a class attribute name corresponding to a table field name of the data source;
mapping and associating the table names with the corresponding class names and the table field names with the corresponding class attribute names respectively in an annotation mode.
In an alternative embodiment, the rule agent module is specifically configured to:
using a reflection mechanism of an object-oriented programming language to carry out instantiation operation on the object relation mapping class, and carrying out attribute assignment operation on the instantiated object relation mapping class;
and an operation method of data stream batch processing is woven into the object relation mapping class after attribute assignment by a dynamic proxy mode, and a rule proxy class with the operation method of the object-oriented programming language is obtained.
In an alternative embodiment, the bridge construction module is specifically configured to:
adopting a bridging mode, and adding a related construction method, an assembly method and a destructor which meet the conversion call of the preset conversion service on the basis of the operation method of the rule proxy class;
And obtaining the bridging call class based on the rule proxy class added with the related construction method, the assembly method and the destructor.
In an alternative embodiment, the apparatus further comprises an labeling module for:
if the input data source table does not meet the preset condition, adding annotation information to the data source table in an annotation mode, wherein the annotation information is used for indicating that SQL query sentences are adopted for expression;
and inputting the data source table added with the labeling information into an SQL analysis engine for conversion processing.
In a third aspect, embodiments of the present application provide an electronic device comprising a processor and a memory, wherein the memory stores a computer program that, when executed by the processor, causes the processor to perform the steps of any of the methods of the first aspect.
In a fourth aspect, embodiments of the present application provide a computer-readable storage medium having a computer program stored therein, which when executed by a processor, implements the steps of the method of any of the first aspects.
In a fifth aspect, embodiments of the present application provide a computer program product comprising a computer program stored in a computer readable storage medium; when the computer program is read from the computer readable storage medium by a processor of an electronic device, the processor executes the computer program, causing the electronic device to perform the steps of any of the methods of the first aspect.
The embodiment of the application has at least the following beneficial effects:
in the scheme of the embodiment of the application, the input data source table is mapped into corresponding object relation mapping classes based on the object relation mapping rule; after performing attribute assignment operation on the object relation mapping class, weaving the object relation mapping class into an operation method of data stream batch processing to obtain a rule proxy class of the object-oriented programming language, which is provided with the operation method; and bridging the rule proxy class to meet the call class of the conversion service interface to obtain a bridging call class, so that the preset conversion service interface calls the bridging call class to perform conversion processing on the data in the data source table. In this way, the idea of object-oriented is used through object relation mapping, a data source table is mapped into an object form through metadata (information describing data attributes), the conversion of a Flink SQL operation into an object operation of an object-oriented programming language is realized, namely, on the basis of the SQL level of the existing layered Flink flow batch data operation, an object relation mapping layer operation is added through an object relation mapping rule, thus, related entity classes and attributes can be directly defined according to service requirements without defining SQL script sentences, the structure of the data source table is described through the object relation mapping classes, and further the abstraction of the data source table is realized, so that the conversion processing of data in the data source table is realized.
Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the application. The objectives and other advantages of the application will be realized and attained by the structure particularly pointed out in the written description and claims thereof as well as the appended drawings.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a method for batch processing a data stream according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a hierarchy at which an object relationship mapping operation is provided in an embodiment of the present application;
FIG. 3 is a schematic diagram of a data stream batching method according to an embodiment of the present disclosure;
FIG. 4 is a schematic diagram of a specific implementation process of a data stream batch processing method according to an embodiment of the present application;
FIG. 5 is a block diagram of a data stream batching apparatus according to an embodiment of the present application;
fig. 6 is a block diagram of an electronic device according to an embodiment of the present application.
Detailed Description
In order to better understand the technical solutions in the present application, the following description will clearly and completely describe the technical solutions in the embodiments of the present application with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, shall fall within the scope of the present application.
The data stream processing, for short, refers to the process that when one piece of data is processed, the data is stored in a buffer in a serialization way, and then is transmitted to the next node from the network immediately, and the next node continues to process the data. In a range sense, data stream processing is an unbounded continuous data processing process.
The batch processing of data, namely, batch processing of data for short, is that when one piece of data is processed, the data is stored in a buffer in a serialization way and is not transmitted to the next node through a network immediately, when the buffer is written up, the data is stored in a disk in a lasting way, and when all the data are processed, the data are transmitted to the next node through the network. Data batching is a bounded data processing process in terms of scope. Data batching may be understood as a special case of data stream processing. In streaming, data defines a sliding window or a scrolling window, with results being generated each time the window slides or scrolls. Batch processing is performed by defining a global window, all records belonging to the same window. Data flow processing can be understood as batch processing implemented through a data definition window.
In the description of the present application, it should be understood that the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or an implicit indication of the number of technical features being indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature. In the description of the present application, unless otherwise indicated, the meaning of "a plurality" is two or more.
The following briefly outlines the design ideas of the present application.
The Flink SQL is a simplified Flink real-time computing model that uses a development language that conforms to standard SQL semantics to achieve the acquisition of raw data sources (sources), the application of computational logic (transform) to compute summaries, and the output of results (Sink) to specified targets. In accordance with the above description, a complete Flink SQL contains the following three components:
1. a data Source processor (Source Operator) abstracts the external data Source and unifies the data Source acquisition mode;
2. the conversion processor (Transformation Operator), also referred to as operator operation. Query and aggregation operations are realized, and union, join, projection, difference, intersection and window operations aiming at SQL aspects are realized;
3. Output processor (Sink Operator): and outputting the operation result to the specified target service. An abstraction of several output forms of the results table is integrated.
However, the Flink SQL is written using SQL semantics, which has the following drawbacks in applications:
1. the abstract granularity of the data source only reaches the data set level;
SQL writing requires certain database query and operation experience, and a non-database developer or a business program developer cannot directly define required SQL script sentences according to business requirements;
3. non-object-oriented. SQL is a description grammar for data query oriented, not associated with the idea of object oriented programming;
4. hard coding is easy to generate, the complexity is high, and the later maintenance is not facilitated.
5. The parameter assignment is based on a simple package implementation and cannot be combined with the business model.
In view of this, the present application provides a data stream batch processing method, apparatus, device, and medium, which uses an object-oriented concept through object relational mapping, maps a data source table into an object form through metadata (information describing data attributes), and implements conversion from a flank SQL operation into an object-oriented programming language object operation, that is, adds a layer of object relational mapping layer operation through an object relational mapping rule on the basis of the SQL layer of the existing layered flank stream batch data operation, so when implementing real-time computing stream or batch processing service requirements, it is unnecessary to define SQL script statements, and can directly define related entity classes and attributes according to service requirements, and describe the structure of the data source table through the object relational mapping class, thereby implementing abstraction of the data source table, so as to implement conversion processing on data in the data source table. And the data source table is mapped into an object form, so that the abstract granularity is small, the post maintenance is convenient, and the data source table can be combined with a service model.
The preferred embodiments of the present application will be described below with reference to the accompanying drawings of the specification, it being understood that the preferred embodiments described herein are for illustration and explanation only, and are not intended to limit the present application, and embodiments and features of embodiments of the present application may be combined with each other without conflict.
The data stream batch processing method of the present application is described in detail below with reference to the accompanying drawings and specific embodiments.
The data flow batch processing of the application can be executed by a server containing the Flink SQL, the server can be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, and also can be a cloud server for providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs (Content Delivery Network, content distribution networks), basic cloud computing services such as big data and artificial intelligent platforms and the like.
As shown in fig. 1, a data stream batch processing method according to an embodiment of the present application includes the following steps S101 to S103:
step S101, based on the object relation mapping rule, the input data source table is mapped into a corresponding object relation mapping class, wherein the object relation mapping class is defined by an object-oriented programming language.
The object relational mapping (Object Relational Mapping, ORM) is a programming technique for implementing conversion between data of different types of systems in an object-oriented programming language. Effectively, it is the creation of a "virtual object database" that can be used in programming languages. In the embodiment of the application, corresponding object classes can be created in advance for each attribute of the data source table to form the object relation mapping rule.
The data source table may include a flow table and a dimension table, the dimension table refers to a dimension information table, wherein a dimension attribute is an angle of observing data, and information of a fact table is supplemented. The stream table refers to a table into which real-time stream data is mapped, and in join query, each piece of data actively queries whether there is matching data in the dimension table.
In an alternative embodiment, the step S101 may include the following steps A1-A2:
a1, determining a class name corresponding to the table name of an input data source table and a class attribute name corresponding to the table field name of the data source based on an object relation mapping rule.
A2, mapping and associating the table names with the corresponding class names and the table field names with the corresponding class attribute names respectively in an annotation mode.
The object relation mapping rule comprises mapping relations of class name corresponding table names and class attribute corresponding table field names. For an input data source, a class name corresponding to a table name of a data source table and a class attribute name corresponding to a table field name of the data source may be determined based on the object relationship mapping rule.
Illustratively, in the dimension table as shown in Table 1, the table name is "store", and the table field name includes a table identification field name "id", a table column field name "item_name"; in the flow table shown in table 2, the table name is "sample", and the table field name includes a table identification field name "id" and a table column field name "item_name".
TABLE 1storage
id item_name
1 CPU
2 memory
TABLE 2
sample
{“id”:“1”,“item_name”:CPU}
{“id”:“2”,“item_name”:memory}
Specifically, a Java annotation mode may be adopted to describe the association information of each attribute of the data source table and the corresponding object class, that is, the object relation mapping definition information, and the relation mapping component may analyze and process the association information in a reflection mode, so that the produced result is a class with the object relation mapping definition information. For example, as shown in table 3, is an object relationship mapping rule.
TABLE 3 Table 3
Figure BDA0004134527910000091
For example, the dimension table of the table 1 is mapped and associated by the object relation mapping rule of the table 3, so as to obtain the following object relation mapping definition information:
@Entity(name=“storage”)
public class Storage{
@Id
private Long id;
@Column(“item_name”)
Private String itemName;
public Storage(){}
...
}
For another example, the flow table in table 2 is mapped and associated by the object relation mapping rule in table 3, and the following object relation mapping definition information is obtained:
@Entity(name=“sample”)
public class Storage{
@Id
private Long id;
@Column(“item_name”)
Private String itemName;
public Storage(){}
...
}
step S102, after performing attribute assignment operation on the object relation mapping class, the operation method of data stream batch processing is woven, and a rule proxy class of the object-oriented programming language with the operation method is obtained.
The attribute in the object relation mapping class is assigned, for example, the field name "item_name" in table 1 corresponds to the attribute "Column" in the object relation mapping class, and as can be seen from table 1, the value corresponding to the attribute "Column" includes "CPU" and "memory".
The operations of batch processing of the data stream include, but are not limited to, operations of searching, updating, batch adding, etc., and the above operation methods can be respectively woven into the object relation mapping class by a dynamic proxy mode to form a rule proxy class capable of interacting with the link data table (batch data form of data stream).
Dynamic proxy refers to a technique that creates proxy objects for target objects and enhances functionality of methods in the target objects at program run-time. In the process of generating the proxy object, the target object is unchanged, and the method in the proxy object is an enhancement method of the target object method. It can be understood that during the operation, the method in the target object is dynamically intercepted, and the functional operation is performed before and after the interception method. The target object may be understood as an object relation mapping class in the embodiment of the present application.
In an alternative embodiment, step S102 may specifically include the following steps B1-B2:
and B1, using a reflection mechanism of an object-oriented programming language to carry out instantiation operation on the object relation mapping class, and carrying out attribute assignment operation on the instantiated object relation mapping class.
Specifically, by using a reflection mechanism of an object-oriented programming language, the object relationship mapping class can be instantiated and the attribute assignment operation can be completed by dynamically analyzing and referring to the object relationship mapping definition.
The reflection may dynamically create an instance, invoke a method of a class or instance for any class in the run state, or may change the properties of a class or instance by reflection methods. The reflection may also access internal information of the type, including modifiers, fields, methods, etc. of the type. The class can be assembled in the running process through reflection, source code links between components are not needed, the code coupling degree is reduced, and the method can be used as a basis for dynamic proxy operation realization.
And B2, weaving an operation method of data stream batch processing into the object relation mapping class after attribute assignment by a dynamic proxy mode, and obtaining a rule proxy class of the object-oriented programming language, which is provided with the operation method.
The operation of batch processing of the data stream comprises operations such as searching, updating, batch adding and the like, and the operation methods such as Select (), update () and BatchInert () are sequentially corresponding, and are respectively woven into the operation methods for the object relation mapping class through a dynamic proxy mode to form a rule proxy class capable of interacting with the Flink data Table.
The rule agent has a unified call entry method: the Method can enable the Flink Table API conversion service to serve as a calling party to input specified parameters, and realize consistent Flink Table API calling.
Step S103, bridging the rule proxy class to the call class meeting the conversion service interface to obtain a bridging call class, so that the preset conversion service interface calls the bridging call class to perform conversion processing on the data in the data source table.
The rule proxy class can be converted into a bridge call class which can be converted by a preset conversion service (Flink Table) and is used as an input source through a bridge mode, and the bridge call class supports conversion realization from a preset conversion service interface (Flink Table API) to SQL, so that subsequent data stream batch processing operation can be completed.
In an alternative embodiment, the step S103 may specifically include the following steps C1-C2:
And C1, adding a related construction method, an assembly method and a destructor which meet the conversion call of the preset conversion service on the basis of the operation method of the rule proxy class by adopting a bridge mode.
Wherein the bridge mode is to separate the abstract part from its implementation part so that they can all be changed independently. It is an object structure type mode, also called Handle and Body mode or interface mode. And bridging the rule proxy class by using a bridging mode to meet the calling class of the Flink Table API conversion service, namely adding related construction, assembly and analysis methods meeting the Flink Table API conversion calling on the basis of the original Flink DataTable proxy operation method.
The bridge mode can separate the Flink Table API operation from the Flink DataTable rule proxy, so that the Flink Table API operation and the Flink DataTable rule proxy can be kept independent of each other, and the Flink DataTable API operation and the Flink DataTable rule proxy can be connected through a bridge method. The bridge schema decouples the binding relationships inherent between abstractions and implementations using "relationships between objects" so that abstractions and implementations can vary along the respective dimensions. The so-called abstractions and implementations are changed along the respective dimensions, that is to say abstractions and implementations are no longer in the same inheritance hierarchy but are "sub-classified" into their own sub-classes, so that any sub-classes are combined, thus obtaining a multi-dimensional combined object. In many cases, the bridge mode can replace a multi-layer inheritance scheme, the multi-layer inheritance scheme violates the single responsibility principle, the reusability is poor, the number of classes is very large, the bridge mode is a better solution than the multi-layer inheritance scheme, and the number of subclasses is greatly reduced. The bridge mode improves the expandability of the system, any one dimension is expanded in two variable dimensions, the original system is not required to be modified, and the switching principle is met.
And C2, obtaining a bridging call class based on the rule proxy class added with the related construction method, the assembly method and the destructor.
And assembling the rule proxy class through a bridging mode to obtain a bridging call class, namely a call class meeting the requirement of the Flink Table API, so that the Flink Table API can call the bridging call class to perform data conversion and processing on the data in the data source Table.
In the embodiment of the application, an object-oriented thought is used through object relation mapping, a data source table is mapped into an object form through metadata, so that the conversion of a Flink SQL operation into an object-oriented programming language operation is realized, namely, a layer of object relation mapping layer operation is added through an object relation mapping rule on the basis of the SQL layer of the existing layered Flink flow batch data operation, thus, when the service requirement of real-time computing flow or batch processing is realized, related entity classes and attributes can be directly defined according to the service requirement without defining SQL script statements, the structure of the data source table is described through the object relation mapping classes, and further the abstraction of the data source table is realized, so that the conversion processing of the data in the data source table is realized. And the data source table is mapped into an object form, so that the abstract granularity is small, the post maintenance is convenient, and the data source table can be combined with a service model.
In an alternative implementation mode, if the input data source table does not meet the preset condition, adding annotation information to the data source table in an annotation mode, wherein the annotation information is used for indicating to use SQL query sentences for expression; and inputting the data source table added with the labeling information into an SQL analysis engine for conversion processing.
Specifically, the preset condition may be that the structure of the data source table is complex, the data source table cannot be mapped into a corresponding object relation mapping class based on the object relation mapping rule, and at this time, the data source table may be indicated to be expressed by using an SQL query statement in an annotation mode, and is directly converted by an SQL analysis engine (Flink SQL analysis engine).
An example of indicating that the data source table is expressed in an SQL Query statement by the @ Query annotation is as follows:
@Entity(name=“processe”)
public class Processe{
@Query(“select name from processe”)
Private String querystr;
public Storage(){}
...
}
in the above embodiment of the application, for the complex association semantic description, the @ Query annotation is provided to directly input the Flink SQL, and the SQL is directly submitted to the Flink Table SQL parsing engine for processing.
The following describes a specific implementation procedure of the data stream batch processing method in the embodiment of the present application with reference to fig. 2 to fig. 4.
As shown in fig. 2, the present application uses an object-oriented concept through Object Relational Mapping (ORM), and maps a real-time stream Table (DataStream Table) and a dimension Table (DataSet Table) into an object form through metadata, so as to implement conversion from a link SQL operation to an object operation of an object-oriented programming language. Based on SQL level of the existing layered Flink flow batch data operation, a layer of object relation mapping layer operation is added through the object relation mapping rule.
The overall flow of the data stream batch processing method in this embodiment of the present application is shown in fig. 3, for the input stream batch data DataStream or DataSet, an input stream batch data object relationship set is obtained through object relationship mapping, an input data table is obtained through a rule agent, then the input data table is input Flink Table Transform after a bridging structure for conversion processing, the data table is output, and finally stream batch data DataStream or DataSet is output.
The overall flow in fig. 3 is described in detail below.
As shown in FIG. 4, the embodiment of the application realizes the mapping conversion of the object relation of the stream lot data source through the object relation mapping definition document; the method comprises the steps of assembling a processing method combining with an object relation mapping relation through a rule agency, and mapping input stream batch data into a rule agency type of an object-oriented programming language, wherein the rule agency type is provided with an operation method; through a bridging structure, the object instance set is converted into a DataTable parameter object instance which can be converted by the Flink Table and is used as an input source, and the object instance is supported to be realized through conversion from the Flink Table API to SQL, so that the subsequent data stream batch processing operation can be completed.
A data stream batching apparatus based on object relation mapping rules generally comprises the following document definitions:
Data source configuration information document: connection information required when connecting to a specified data source. Comprising the following steps: database, database table name, login name, password, and connection string.
Data source configuration document examples:
stream.src.name=kafka_stream_source
stream.src.ip=10.0.0.1
stream.src.port=3333
stream.src.topic=stream_source
stream.dst.name=kafka_dest_store
stream.dst.ip=10.1.2.2
stream.dst.port=4444
stream.dst.topic=dest_store
db.name=mysql_db
db.url=jdbc:mysql://10.0.0.1/test_db
db.username=root
db.password=root123
db.table_name=configuration
specifically, stream is used to represent a "stream table", and db is used to represent a "dimension table"; the flow table corresponds to source "src" and target source "dst"; the dimension table only supports association queries, i.e., as a supplement to the association of the flow table data.
1. Data flow table or dimension table and object relation mapping definition: the mapping relation definition configuration of the data flow table or the dimension table and the class of the object-oriented programming language is that the identification class name corresponds to the table name and the class attribute corresponds to the table field name. The implementation mode is as follows:
a) The method is realized by adopting a Java annotation mode and is used for describing mapping definition information of the object association class and the source table;
b) The object association mapping definition information is analyzed and processed by the relation mapping component in a reflection mode;
c) The result of the output is a class with the definition of the object association mapping configuration relation.
2. Classes defined by object-oriented programming language: the structure for describing the data source table (stream table or dimension table) is provided with one or a plurality of identification attribute fields and a plurality of data attribute fields, and the record mapping in the corresponding data stream table or dimension table is realized.
The naming convention for object-oriented class definition is as follows:
a) One class corresponds to one table. For example, class names may be in the singular in English and capitalized; the table names correspond to specific names, and aliases may also be defined, and the format should be in lowercase form.
b) Each table must be defined by a primary key field, typically an integer field designated as id.
c) The foreign key field name rule associated with the dimension table is as follows: associated primary table name_id, such as: item_id. In the implementation flow, the implementation process of the data stream batch processing method based on the object relation mapping rule is as follows:
a) Mapping and associating the object class with the data flow table and the dimension table field by an annotation mode:
b) According to the known table names and the field names and types contained in the tables, corresponding notes are respectively defined, for example, table 3 in the above embodiment of the present application;
c) Defining classes according to the data flow table or the dimension table data and applying notes;
d) The direct annotation employs SQL query statements.
2. And using a reflection mechanism of an object-oriented programming language, instantiating the stream table or the dimension table data class by dynamically analyzing and referring to the object relation mapping definition, and completing attribute assignment operation.
The reflection may dynamically create an instance, invoke a method of a class or instance for any class in the run state, or may change the properties of a class or instance by reflection methods. The reflection may also access internal information of the type, including modifiers, fields, methods, etc. of the type. The class can be assembled in the running process through reflection, source code links between components are not needed, the code coupling degree is reduced, and the method can be used as a basis for dynamic proxy operation realization.
3. And the data flow batch processing related predicate operation method logic related to the object relation mapping class is woven in by using a dynamic proxy mode.
Searching, updating and batch adding operations required by the object relation mapping of the corresponding flow table or the dimension table are respectively woven into the object relation mapping class by a dynamic proxy mode: and the operation methods of Select (), update () and BatchInert () and the like form a rule proxy class which can interact with the Flink data Table.
The rule agent implementation class has a unified call entry method: invoke (Object proxy, method, object [ ] args)
The method can enable the Flink Table API conversion service to serve as a calling party to be transmitted into the appointed parameter, and achieve consistent Flink Table API calling.
4. And bridging the rule agent operation realization class by using a bridging mode to meet the requirement of the Flink Table API conversion service, namely adding related construction, assembly and analysis methods meeting the requirement of the Flink Table API conversion service on the basis of the original Flink Table agent operation method.
The bridge mode can separate the Flink Table API operation from the Flink DataTable rule proxy, so that the Flink Table API operation and the Flink DataTable rule proxy can be kept independent of each other, and the Flink DataTable API operation and the Flink DataTable rule proxy can be connected through a bridge method.
5. The object relation mapping rule agent class assembled in the bridging mode, and the output result of the data stream batch processing processed by the Flink Table API conversion processing can be:
(1) Calling class meeting Flink Table API
(2) Flink SQL as an intermediate state
(3) And analyzing the error abnormal information.
The data stream batch processing device mapped by the object relation preferentially processes the data stream batch processing device according to the object relation to be a bridge-type service class which can be called by the conversion process of the Flink Table API; for the complex association semantic description, the @ Query annotation is provided to directly input the Flink SQL, and the SQL is directly submitted to the Flink Table SQL analysis engine for processing.
According to the embodiment of the application, the data stream batch processing operation is realized through the object relation rule by utilizing the angle that the object-oriented language is closer to the service; the SQL statement is avoided being written, when the real-time calculation flow or batch processing calculation service requirement is realized, related entity classes and attributes can be directly defined according to the service requirement, and the association relationship between the data flow table and the dimension table is described through the combination relationship of a plurality of class examples; and the annotation mode is supported to configure the object relation mapping, so that the quantity and the scale of related configuration files are reduced. The method is flexible in selection, can support data stream batch operation of the object-oriented relation mapping rule proxy, and also can support a complex query operation mode of directly inputting the Flink SQL.
Compared with the prior art, the data stream batch processing method of the embodiment of the application has the following advantages:
1. compared with the implementation of data stream batch operation based on a basic data set mode, the embodiment of the application realizes a data stream batch processing mechanism based on an object relation mapping rule;
2. compared with the method that the batch processing of the Flink Table API data stream can only carry out query operation in an SQL mode, the embodiment of the application provides the method that the object relation mapping rule proxy class can be used, and the batch processing operation of the data stream is realized by the bridging call class which is assembled to be in accordance with the conversion of the Flink Table API;
3. compared with the semantic and descriptive differences between SQL writing and object-oriented programming language, the method and the device in the embodiment of the application convert the Flink stream batch data from an input end into an object-oriented relation mapping structure through object relation mapping processing through object relation mapping, and support the assembly operation stream batch data in a programming language operation mode, namely in an object instance, attribute assignment and method calling mode;
4. compared with the fact that the pure object-oriented relation mapping cannot meet the object relation description definition of the complex scene, the embodiment of the application supports the realization of data stream batch operation in an object relation mapping mode, and also supports the realization of data stream batch operation by directly transmitting the data stream batch operation to the flank Table APIs through inputting the flank SQL.
Based on the same inventive concept, the embodiment of the present application further provides a data stream batch processing device, and since the principle of the device for solving the problem is similar to that of the data stream batch processing method, the implementation of the device can refer to the embodiment of the method, and the repetition is omitted.
As shown in fig. 5, an embodiment of the present application provides a data stream batch processing apparatus, including:
the mapping module 51 is configured to map the input data source table into a corresponding object relationship mapping class based on an object relationship mapping rule, where the object relationship mapping class is defined by an object-oriented programming language;
the rule proxy module 52 is configured to, after performing attribute assignment operation on the object relationship mapping class, weave in an operation method of data stream batch processing, and obtain a rule proxy class with an operation method of an object-oriented programming language;
the bridging construction module 53 is configured to bridge the rule proxy class to the call class satisfying the conversion service interface, and obtain a bridging call class, so that the preset conversion service interface calls the bridging call class to perform conversion processing on the data in the data source table.
In an alternative embodiment, the mapping module 51 is specifically configured to:
based on an object relation mapping rule, determining a class name corresponding to a table name of an input data source table and a class attribute name corresponding to a table field name of the data source; wherein each attribute includes a table name, a table field name;
Mapping and associating the table names with the corresponding class names and the table field names with the corresponding class attribute names respectively in an annotation mode.
In an alternative embodiment, the rule agent module 52 is specifically configured to:
using a reflection mechanism of an object-oriented programming language to carry out instantiation operation on the object relation mapping class, and carrying out attribute assignment operation on the instantiated object relation mapping class;
and an operation method of data stream batch processing is woven into the object relation mapping class after attribute assignment by a dynamic proxy mode, and a rule proxy class with the operation method of the object-oriented programming language is obtained.
In an alternative embodiment, the bridge construction module 53 is specifically configured to:
adopting a bridging mode, and adding a related construction method, an assembly method and a deconstructing method which meet the conversion call of a preset conversion service on the basis of the operation method of the rule proxy class;
and obtaining the bridging call class based on the rule proxy class added with the related construction method, the assembly method and the destructor.
In an alternative embodiment, the apparatus further comprises an labeling module for:
if the input data source table does not meet the preset conditions, adding annotation information to the data source table in an annotation mode, wherein the annotation information is used for indicating to use SQL query sentences for expression;
And inputting the data source table added with the labeling information into an SQL analysis engine for conversion processing.
In the embodiment of the application, an object-oriented thought is used through object relation mapping, a data source table is mapped into an object form through metadata, so that the conversion of a Flink SQL operation into an object-oriented programming language operation is realized, namely, a layer of object relation mapping layer operation is added through an object relation mapping rule on the basis of the SQL layer of the existing layered Flink flow batch data operation, thus, when the service requirement of real-time computing flow or batch processing is realized, related entity classes and attributes can be directly defined according to the service requirement without defining SQL script statements, the structure of the data source table is described through the object relation mapping classes, and further the abstraction of the data source table is realized, so that the conversion processing of the data in the data source table is realized. And the data source table is mapped into an object form, so that the abstract granularity is small, the post maintenance is convenient, and the data source table can be combined with a service model.
Based on the same inventive concept, the embodiment of the present application further provides an electronic device, and since the principle of solving the problem of the electronic device is similar to that of the method, the implementation of the electronic device may refer to the embodiment of the method, and the repetition is not repeated.
Referring to fig. 6, the electronic device may include a processor 62 and a memory 61. The memory 61 provides program instructions and data stored in the memory 61 to the processor 62. In the disclosed embodiment, the memory 61 may be used to store programs for multimedia asset processing in the disclosed embodiment.
Processor 62 is operative to perform the method of any of the method embodiments described above, such as a data stream batch method provided by the embodiment shown in fig. 1, by invoking program instructions stored in memory 61.
The specific connection medium between the memory 61 and the processor 62 described above is not limited in the embodiments of the present disclosure. The embodiment of the present disclosure is shown in fig. 6, where the memory 61 and the processor 62 are connected by a bus 63, where the bus 63 is shown in bold lines in fig. 6, and the connection between other components is merely illustrative, and not limited thereto. The bus 63 may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, only one thick line is shown in fig. 6, but not only one bus or one type of bus.
The Memory may include Read-Only Memory (ROM) and random access Memory (Random Access Memory, RAM), and may also include Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the aforementioned processor.
The processor may be a general-purpose processor, including a central processing unit, a network processor (Network Processor, NP), etc.; but also digital instruction processors (Digital Signal Processing, DSP), application specific integrated circuits, field programmable gate arrays or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
The disclosed embodiments also provide a computer storage medium having a computer program stored therein, the computer program being read from the computer storage medium by a processor of an electronic device, the processor executing the computer program, so that the electronic device performs the energy saving method of the data center in any of the above-described method embodiments.
In a specific implementation, the computer storage medium may include: a universal serial bus flash disk (USB, universal Serial Bus Flash Drive), a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk or an optical disk, or the like, which can store program codes.
Based on the same inventive concept as the above-described method embodiments, the present application provides a computer program product comprising computer instructions stored in a computer-readable storage medium. The processor of the electronic device reads the computer instructions from the computer readable storage medium and executes the computer instructions to cause the electronic device to perform the steps of any of the data stream batch processing methods described above.
The computer program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be apparent to those skilled in the art that various modifications and variations can be made in the present application without departing from the spirit or scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims and the equivalents thereof, the present application is intended to cover such modifications and variations.

Claims (10)

1. A method for batch processing a data stream, comprising:
mapping an input data source table into a corresponding object relation mapping class based on an object relation mapping rule, wherein the object relation mapping class is defined by an object-oriented programming language;
after performing attribute assignment operation on the object relation mapping class, weaving into an operation method of data stream batch processing to obtain a rule proxy class of the object-oriented programming language with the operation method;
And bridging the rule proxy class to the call class meeting the conversion service interface to obtain a bridging call class, so that the preset conversion service interface calls the bridging call class to perform conversion processing on the data in the data source table.
2. The method of claim 1, wherein mapping the input data source table into the corresponding object relationship mapping class based on the object relationship mapping rule comprises:
based on an object relation mapping rule, determining a class name corresponding to a table name of an input data source table and a class attribute name corresponding to a table field name of the data source;
mapping and associating the table names with the corresponding class names and the table field names with the corresponding class attribute names respectively in an annotation mode.
3. The method according to claim 1, wherein the step of weaving the operation method of the batch processing of the data stream after the attribute assignment operation is performed on the object relation mapping class, to obtain a rule proxy class of the object oriented programming language with the operation method, includes:
using a reflection mechanism of an object-oriented programming language to carry out instantiation operation on the object relation mapping class, and carrying out attribute assignment operation on the instantiated object relation mapping class;
And an operation method of data stream batch processing is woven into the object relation mapping class after attribute assignment by a dynamic proxy mode, and a rule proxy class with the operation method of the object-oriented programming language is obtained.
4. A method according to any one of claims 1 to 3, wherein bridging the rule agent class to a call class satisfying a preset translation service, obtaining a bridged call class, comprises:
adopting a bridging mode, and adding a related construction method, an assembly method and a destructor which meet the conversion call of the preset conversion service on the basis of the operation method of the rule proxy class;
and obtaining the bridging call class based on the rule proxy class added with the related construction method, the assembly method and the destructor.
5. The method according to claim 1, wherein the method further comprises:
if the input data source table does not meet the preset condition, adding annotation information to the data source table in an annotation mode, wherein the annotation information is used for indicating that SQL query sentences are adopted for expression;
and inputting the data source table added with the labeling information into an SQL analysis engine for conversion processing.
6. A data stream batching apparatus, comprising:
The mapping module is used for mapping the input data source table into corresponding object relation mapping classes based on the object relation mapping rule, wherein the object relation mapping classes are classes defined by an object-oriented programming language;
the rule agent module is used for weaving the attribute assignment operation of the object relation mapping class into an operation method of data stream batch processing to obtain a rule agent class of the object-oriented programming language, which is provided with the operation method;
and the bridging construction module is used for bridging the rule proxy class to the call class meeting the conversion service interface to obtain a bridging call class so that the preset conversion service interface calls the bridging call class to perform conversion processing on the data in the data source table.
7. The apparatus of claim 6, wherein the mapping module is specifically configured to:
based on an object relation mapping rule, determining a class name corresponding to a table name of an input data source table and a class attribute name corresponding to a table field name of the data source;
mapping and associating the table names with the corresponding class names and the table field names with the corresponding class attribute names respectively in an annotation mode.
8. The apparatus of claim 6, wherein the rule agent module is specifically configured to:
Using a reflection mechanism of an object-oriented programming language to carry out instantiation operation on the object relation mapping class, and carrying out attribute assignment operation on the instantiated object relation mapping class;
and an operation method of data stream batch processing is woven into the object relation mapping class after attribute assignment by a dynamic proxy mode, and a rule proxy class with the operation method of the object-oriented programming language is obtained.
9. An electronic device comprising a processor and a memory, wherein the memory stores a computer program that, when executed by the processor, causes the processor to perform the steps of the method of any of claims 1-5.
10. A computer readable storage medium, characterized in that it comprises a computer program for causing an electronic device to perform the steps of the method according to any one of claims 1-5 when said computer program is run on the electronic device.
CN202310268071.9A 2023-03-17 2023-03-17 Data stream batch processing method, device, equipment and medium Pending CN116302219A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310268071.9A CN116302219A (en) 2023-03-17 2023-03-17 Data stream batch processing method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310268071.9A CN116302219A (en) 2023-03-17 2023-03-17 Data stream batch processing method, device, equipment and medium

Publications (1)

Publication Number Publication Date
CN116302219A true CN116302219A (en) 2023-06-23

Family

ID=86823693

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310268071.9A Pending CN116302219A (en) 2023-03-17 2023-03-17 Data stream batch processing method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN116302219A (en)

Similar Documents

Publication Publication Date Title
US8286191B2 (en) Dynamically composing data stream processing applications
US7873611B2 (en) Boolean literal and parameter handling in object relational mapping
CN107861728B (en) Method and system for converting traditional program language into modern program language
US9128991B2 (en) Techniques to perform in-database computational programming
US8682876B2 (en) Techniques to perform in-database computational programming
US20080319957A1 (en) Extensible command trees for entity data model platform
US8701087B2 (en) System and method of annotating class models
US20080183725A1 (en) Metadata service employing common data model
US7779047B2 (en) Pluggable merge patterns for data access services
US7996416B2 (en) Parameter type prediction in object relational mapping
JP2017534996A (en) System and method for providing and executing a domain specific language for a cloud service infrastructure
US9720960B2 (en) Reporting tools for object-relational databases
US7499956B1 (en) Annotation processing from source files and class files
CN112860730A (en) SQL statement processing method and device, electronic equipment and readable storage medium
US20200004664A1 (en) Automatic mock enablement in a multi-module software system
US20190004927A1 (en) Accessing application runtime data using a query language
CN112650526B (en) Method, device, electronic equipment and medium for detecting version consistency
CN110633162B (en) Remote call implementation method and device, computer equipment and storage medium
WO2024041301A1 (en) Method and apparatus for generating unified abstract syntax tree, and program analysis method and apparatus
US10719424B1 (en) Compositional string analysis
CN116302219A (en) Data stream batch processing method, device, equipment and medium
CN113656433B (en) Entity object expansion method, entity object expansion device, electronic equipment and storage medium
CN109857390B (en) Annotation transmission method of Git warehouse file annotation system
TWI707273B (en) Method and system of obtaining resources using unified composite query language
IL279776A (en) Design and control of event-driven software applications

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination