CN113760987A - Data processing method and data processing platform - Google Patents

Data processing method and data processing platform Download PDF

Info

Publication number
CN113760987A
CN113760987A CN202110156169.6A CN202110156169A CN113760987A CN 113760987 A CN113760987 A CN 113760987A CN 202110156169 A CN202110156169 A CN 202110156169A CN 113760987 A CN113760987 A CN 113760987A
Authority
CN
China
Prior art keywords
data
target scene
dto
source
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110156169.6A
Other languages
Chinese (zh)
Inventor
李荣明
王衡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Wodong Tianjun Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN202110156169.6A priority Critical patent/CN113760987A/en
Publication of CN113760987A publication Critical patent/CN113760987A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24568Data stream processing; Continuous queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/546Message passing systems or structures, e.g. queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/547Remote procedure calls [RPC]; Web services

Abstract

The invention discloses a data processing method and a data processing platform, and relates to the technical field of computers. One embodiment of the method comprises: receiving source data to be processed; matching the source data with a plurality of target scenes which are configured in advance to obtain at least one target scene corresponding to the source data; wherein, each target scene comprises: identification information of required data of the target scene and destination information used for representing a data output mode of the target scene; for each target scene corresponding to the source data, acquiring the required data of the target scene according to the identification information and/or the source data in the target scene, and outputting the target data generated based on the required data by using the destination information in the target scene. The implementation mode can realize the decoupling of the data processing flow and the service logic.

Description

Data processing method and data processing platform
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a data processing method and a data processing platform.
Background
Streaming data refers to a sequence of data that arrives sequentially, massively, quickly, and continuously. In the existing streaming data processing technology, source data is firstly acquired, then scene matching, data tracing and target data assembly are realized by adopting a hard coding mode according to a specific service scene, and finally a certain scene exit is pointed to, so that customization development highly coupled with the service scene is realized, and the streaming data processing technology has the defects that: the data processing flow designs a heavy service scene to take the service scene as a starting point, so the customization degree is high, the data processing flow is highly coupled with the service, the universality, the expansibility and the flexibility of the data processing flow are low, and the data processing flow can only be repeatedly designed when various service scenes or service scenes are changed. In addition, the existing data processing flow generally performs directional transmission of data based on a single data inlet and outlet, and cannot meet the actual service requirements.
Disclosure of Invention
In view of this, embodiments of the present invention provide a data processing method and a data processing platform, which can implement decoupling of a data processing flow and a service logic.
To achieve the above object, according to one aspect of the present invention, a data processing method is provided.
The data processing method of the embodiment of the invention comprises the following steps: receiving source data to be processed; matching the source data with a plurality of target scenes which are configured in advance to obtain at least one target scene corresponding to the source data; wherein, each target scene comprises: identification information of required data of the target scene and destination information used for representing a data output mode of the target scene; for each target scene corresponding to the source data, acquiring the required data of the target scene according to the identification information and/or the source data in the target scene, and outputting the target data generated based on the required data by using the destination information in the target scene.
Optionally, the method further comprises: after receiving source data to be processed, analyzing and deserializing the source data to form a data transmission object DTO.
Optionally, the method further comprises: after the DTO is formed, matching the DTO with a plurality of pre-configured data association relations to obtain a data association relation corresponding to the DTO, and binding the DTO with the obtained data management relation to form a business data object BO.
Optionally, the source data includes data source information, and each data association relationship includes a matching condition; and matching the DTO with a plurality of pre-configured data association relations to obtain the data association relation corresponding to the DTO, including: traversing each pre-configured data association relationship, and judging whether the data source information in the DTO meets the matching condition in the data association relationship; and determining the data association relation with the judgment result of yes as the data association relation corresponding to the DTO.
Optionally, the data source information of the source data includes names of database tables, each data association relationship includes an alias mapping rule for a specific field in the database table, and the alias mapping rule represents a mapping relationship between an original name and an alias of the specific field; and, the method further comprises: after obtaining the data association relation corresponding to the DTO, if the DTO comprises the specific field data indicated by the obtained data association relation, generating alias field data corresponding to the specific field data according to an alias mapping rule in the data association relation, and encapsulating the alias field data in the DTO; wherein the alias field data corresponds to the field name of the specific field data based on the alias mapping rule, and the field values are the same.
Optionally, each target scene includes: at least one trigger condition for the target scenario; and matching the source data with a plurality of target scenes configured in advance to obtain at least one target scene corresponding to the source data, including: traversing each target scene, and judging whether the DTO in the BO meets the triggering condition of the target scene; and determining at least one target scene with the judgment result of yes as a target scene corresponding to the source data.
Optionally, the acquiring, for each target scene corresponding to the source data, the required data of the target scene according to the identification information in the target scene and/or the source data includes: judging whether the required data of the target scene exists in the DTO in the BO according to the identification information of the required data in the target scene: if yes, acquiring the required data from the DTO; after acquiring the required data from the DTO, if the data required by the target scene which is not acquired still exists, acquiring the data required by the target scene which is not acquired according to the identification information, the preconfigured data association relation and the preconfigured tracing information in the target scene.
Optionally, each data association relationship includes an identifier of at least one query method; and acquiring the data required by the unacquired target scene according to the identification information, the preconfigured data association relationship and the preconfigured tracing information in the target scene, including: for any data required by the unacquired target scene: traversing each data association relation, and detecting whether the identification information of the required data corresponds to the name of the data association relation; determining the data association relation with the detection result of yes as the specific data association relation of the required data; and obtaining a tracing path from the tracing information according to the query method identifier in the specific data association relation, and obtaining the required data by using the tracing path.
Optionally, the query method further comprises a direct query method and at least one hierarchical level of parent query methods; and obtaining a tracing path from the tracing information according to the query method identifier in the specific data association relation, and obtaining the required data by using the tracing path, wherein the steps of: obtaining a tracing path from the tracing information according to the identifier of the direct query method in the specific data association relationship, and executing a first query operation by using the tracing path to try to obtain the required data; if the required data is not acquired, acquiring a tracing path from the tracing information according to the identifier of the parent query method in the specific data association relation, and acquiring data required for executing a first query operation by using the tracing path; and executing the first query operation again according to the required data so as to acquire the required data.
Optionally, the receiving source data to be processed includes: receiving source data to be processed by using a message queue, a Remote Procedure Call (RPC) or an incremental data synchronization tool; the destination information includes: sending the target data to a message queue, sending the target data through RPC, and/or executing a preset calculation logic.
To achieve the above object, according to another aspect of the present invention, a data processing platform is provided.
The data processing platform of the embodiment of the invention can comprise: the source data receiving unit is used for receiving source data to be processed; the target scene matching unit is used for matching the source data with a plurality of preset target scenes to obtain at least one target scene corresponding to the source data; wherein, each target scene comprises: identification information of required data of the target scene and destination information used for representing a data output mode of the target scene; the source tracing and output unit is used for: for each target scene corresponding to the source data, acquiring the required data of the target scene according to the identification information and/or the source data in the target scene, and outputting the target data generated based on the required data by using the destination information in the target scene.
Optionally, the data processing platform may further comprise: an parsing and deserializing unit to: after receiving source data to be processed, analyzing and deserializing the source data to form a Data Transmission Object (DTO); the data association unit is used for matching the DTO with a plurality of pre-configured data association relations after the DTO is formed, obtaining the data association relation corresponding to the DTO, and binding the DTO with the obtained data management relation to form a business data object BO; the tracing and output unit may be further configured to: judging whether the required data of the target scene exists in the DTO in the BO according to the identification information of the required data in the target scene: if yes, acquiring the required data from the DTO; after acquiring the required data from the DTO, if the data required by the target scene which is not acquired still exists, acquiring the data required by the target scene which is not acquired according to the identification information, the preconfigured data association relation and the preconfigured tracing information in the target scene.
To achieve the above object, according to still another aspect of the present invention, there is provided an electronic apparatus.
An electronic device of the present invention includes: one or more processors; and the storage device is used for storing one or more programs, and when the one or more programs are executed by the one or more processors, the one or more processors realize the data processing method provided by the invention.
To achieve the above object, according to still another aspect of the present invention, there is provided a computer-readable storage medium.
A computer-readable storage medium of the present invention has stored thereon a computer program which, when executed by a processor, implements the data processing method provided by the present invention.
According to the technical scheme of the invention, the embodiment of the invention has the following advantages or beneficial effects:
and configuring a plurality of target scenes in advance according to the service logic, wherein each target scene comprises identification information of required data of the target scene and destination information used for representing a data output mode of the target scene. After receiving the source data, matching the source data with the target scenes to obtain one or more target scenes corresponding to the source data, acquiring required data of the target scenes according to the obtained identification information and/or the source data in the target scenes to generate target data, and finally outputting the target data by utilizing the destination information in the target scenes. Therefore, the decoupling of the data processing flow and the service logic is realized, and the worker can execute the general data processing flow (namely, the general data processing platform is used) to realize the service logic only by configuring the required data and the destination information of the target scene according to the service logic in advance, so that the high expansibility and flexibility of the data processing flow and the data processing platform are realized.
In a specific embodiment, in addition to pre-configuring the target scenario, a corresponding data association relationship is pre-configured for each object (e.g., a database table) storing the source data, where a query method identifier for performing data tracing and an alias mapping rule for a specific field in the database table are defined in the data association relationship, and the pre-configured query method identifier corresponds to the tracing information for providing the tracing path. After receiving the source data, the following data processing flow may be performed with the data processing platform: firstly, source data is analyzed and serialized into a data transmission object DTO, then, a corresponding data association relation is obtained according to data source information (such as a database name where the source data is located and a database table name) in the source data and is bound to form a business data object BO, then, a target scene corresponding to the BO is obtained, and tracing is performed according to required data identification information in the target scene. Specifically, whether required data of the target scene exists in source data is judged, and if the required data of the target scene exists, the required data is obtained from the source data; and then if the needed data which is not obtained still exists, determining a corresponding specific data association relation by using the needed data identification information in the target scene, obtaining a tracing path from the tracing information according to the query method identification in the specific data association relation so as to query the needed data which is not obtained, thereby completing the needed data of the target scene, and finally outputting the target data generated based on the needed data according to the destination information in the target scene. It can be seen that the data processing platform and the data processing flow are completely decoupled from the service logic, a worker can pre-configure service logics such as a data association relation related to the service (including a matching condition for matching with source data, a query method identifier and an alias mapping rule), a target scene (including a trigger condition for matching with the source data, required data identifier information and destination information), source tracing information (such as an RPC name corresponding to the query method identifier and a method therein), and the like, and each step of the data processing flow executed thereafter is independent of the service, so that the universality, the expansibility and the flexibility of the data processing platform can be improved, when various service scenes or service scenes are changed, only relevant configuration needs to be adjusted, and a data processing platform does not need to be changed, so that a high-availability general streaming data processing platform is realized.
In addition, the data processing platform of the embodiment of the invention can support various data inlets such as a message queue, a Remote Procedure Call (RPC), an incremental data synchronization tool (Canal) and the like, and various data outlets such as a message queue, an RPC, preset computational logic and the like, thereby overcoming the defect that data directional transmission is performed based on a single data inlet and outlet in the prior art, and better meeting the actual service requirements.
Further effects of the above-mentioned non-conventional alternatives will be described below in connection with the embodiments.
Drawings
The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:
FIG. 1 is a schematic diagram of the main steps of a data processing method according to an embodiment of the present invention;
FIG. 2 is a block diagram of a data processing method according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of configuration information according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a data flow trajectory of a data processing method according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of components of a data processing platform in an embodiment of the invention;
FIG. 6 is an exemplary system architecture diagram in which embodiments of the present invention may be employed;
fig. 7 is a schematic structural diagram of an electronic device for implementing the data processing method in the embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
It should be noted that the embodiments of the present invention and the technical features of the embodiments may be combined with each other without conflict.
Before explaining the technical solution of the present invention, the related concept in the present invention will be explained first.
Configuration (config): separating the data from the source code, and storing the data in a configuration file, such as a properties file or a spring.
Backto source (i.e., cfg _ bts): data that cannot be provided by the source data must be obtained through a specified path (i.e., a traceback path).
Data relationship ship, i.e., cfg _ drs): the method is used for reflecting the incidence relation between different objects (such as database tables) through the query method identification, and the query mode identification can be used for acquiring a tracing path; the data association relationship is also used for binding with the source data and implementing the mapping of the field alias.
Target scene (cfg _ ts): according to business logic formulation, what data needs to be acquired after the source data meets a certain trigger condition (i.e. required data) and how the data needs to be output (i.e. destination information) are characterized.
Target scene match (match, i.e., cfg _ ts-m): namely, the trigger condition in the target scene, is used to implement the correspondence between the source data and the target scene.
Content (body, i.e., cfg _ ts-b): i.e. the required data of the target scene. When the source data comprises all the required data, tracing is not performed any more; otherwise, tracing is required to be performed, and the required data is completed through certain logic, so that target data is formed.
Go (exit channel, i.e. cfg _ ts-ec): the subsequent execution logic after completion of the required data of the target scenario may be to send the data to a message queue, Call a specified RPC (remote Procedure Call) to send the data, or execute a custom calculation logic.
Data Transfer Object (DTO): and the data object encapsulated by the source data is used for decoupling when interacting with an external system.
Business Object (BO): and encapsulating the data object by the DTO and the data association relation.
Source Data (SD): and the data is transmitted from the external system and is used as the inlet data to be processed by the data processing platform.
Target Data (TD): the result of the data that needs to be output is the output during the assembly process.
Trace source data (data by back to source, i.e., bts _ data): data that is needed in the target data but not included in the source data needs to be obtained by executing the tracing logic.
Destination data go (exit channel, i.e., TD _ ec): after the data is assembled, the final destination of the target data will be determined by the configuration cfg _ ts-ec. In general, TD _ ec has a similar meaning to cfg _ ts-ec.
Fig. 1 is a schematic diagram of main steps of a data processing method according to an embodiment of the present invention.
As shown in fig. 1, the data processing method according to the embodiment of the present invention can be executed by a data processing platform, and includes the following specific steps:
step S101: source data to be processed is received.
In this step, the data processing platform may receive the source data in a message queue, an RPC (remote Procedure Call), an incremental data synchronization tool, a Canal, or the like, and may receive the source data by interfacing with a data source such as a database, a cache, a file, an ES (Elastic Search, a Search and data analysis engine), and the like, where the source data may be streaming data. In practical applications, the data processing platform may agree with the data source for a standard format of the source data, such as JSON (JSON Object Notation) format.
Generally, the source data includes data source information for characterizing the source of the source data. For example, when the source data is from a database table, the corresponding data source information may be a database name where the source data is located and a database table name (hereinafter, referred to as the library name and the table name, respectively), and the following description will mainly take a case where the source data is from the database table as an example.
Step S102: matching the source data with a plurality of target scenes which are configured in advance to obtain at least one target scene corresponding to the source data; wherein, each target scene comprises: identification information of required data of the target scene and destination information used for representing a data output mode of the target scene.
In the embodiment of the present invention, a target scenario is used to implement a specific service logic, which may be flexibly configured in advance according to a service requirement, and the target scenario may include: the system comprises a trigger condition for triggering a target scene, identification information of data required by the target scene and destination information for representing a data output mode of the target scene. The trigger condition may be a determination condition for a source of the source data (for example, a library name where the source data is located, a table name) and a data feature (for example, which operation is performed on which field), the required data refers to data that is determined by the business logic and needs to be output by the target scenario, and the identifier of the required data may be a combination of the library name, the table name, and the field name corresponding to the data.
Illustratively, the above trigger conditions may be one or more, and when the source data satisfies any trigger condition, the target scenario is triggered. The direction information may be: sending the target data to a message queue (the target data is generated based on the above required data), sending the target data through RPC, and/or performing a preset calculation logic for the target data.
For example, if the business logic is: when the field a of the database table A is modified, the name of a modifier and the modification time are output to a message queue, and then the source data can comprise related data before and after the field a of the database table A is modified; in a target scene configured according to the business logic, the triggering condition is that the field a of the database table A is changed, the required data is the name of a modifier and the modification time, and the destination information is that the target data is output to a message queue.
As a preferred scheme, before performing the target scene matching, the following specific steps may be performed first, and the following specific steps may be referred to in fig. 2.
Firstly, source data SD is analyzed and deserialized to form a DTO which can be identified and used by a data processing platform, and after the analysis is completed, the source data is decoupled from an external data source. In particular, data features extracted from the source data may be included in the DTO, for example, if the source data is related data before and after the modification of the field a of the database table, the "change field to a.a" may be extracted and encapsulated as the data features in the DTO. Thereafter, the DTO may be matched with a plurality of preconfigured data association relations cfg _ drs, so as to obtain a data association relation corresponding to the DTO, and the DTO and the obtained data management relation are bound to form a service data object BO.
Preferably, each data association relationship may include a matching condition for matching with the DTO. Illustratively, the matching condition may be a regular expression associated with the data source. The matching steps of the data association relation and the source data are as follows: firstly, traversing each data association relation configured in advance, and judging whether data source information in the DTO meets a matching condition in the data association relation; and determining the data association relation with the judgment result of yes as the data association relation corresponding to the DTO.
In practical applications, each data association relationship may further include an alias mapping rule, which is a rule for a specific field in a database table where the source data is located, and can characterize a mapping relationship between an original name and an alias of the specific field. The alias mapping rule has at least the following two functions: first, a distinction is made when different database tables have fields with the same name. For example, different database tables often have fields of the same name: ID, which is easy to confuse and affects subsequent queries, so the following alias mapping rules can be configured for the stu table where the source data is located: ID-stu _ ID, to distinguish from the ID fields of other tables. Second, some database tables have field names that are not standard canonical names and are also susceptible to queries. For example, most database tables have IDs as primary keys, but the primary key of the stu table is stu _ NO, and then the following alias mapping rules may be configured for the stu table: stu _ NO, stu _ ID, normalizes the field name to facilitate subsequent data queries.
Based on the above actions, after the data association relationship corresponding to the DTO is obtained, if the DTO includes the specific field data indicated by the obtained data association relationship, the alias field data corresponding to the specific field data may be generated according to the alias mapping rule in the data association relationship, and the alias field data may be encapsulated in the DTO. Wherein the alias field data corresponds to the field name of the specific field data based on the alias mapping rule, and the field values are the same. For example, when the alias mapping rule is ID — stu _ ID, the ID field is a specific field, and thus, a stu _ ID field may be added to the DTO, each field value of the stu _ ID field is the same as the ID field, only the field names are different, and the two field names correspond to each other based on the alias mapping rule.
In particular, in the embodiment of the present invention, each data association relationship may further include an identifier of at least one query method (query method), where the query method identifier is used to uniquely indicate the query method. For example, in an RPC-based query approach, the query method identification may consist of the RPC name and the method name in the RPC. The query method identifier has the function of acquiring a corresponding tracing path from preconfigured tracing information so as to query related data.
In practical application, the query method in the data association relationship may include a direct query method and at least one hierarchical parent query method, and the parent query method is used to provide assistance from a previous hierarchical level when the direct query method cannot successfully query. For example, a data association relationship includes a direct query method1, a primary parent query method2, and a secondary parent query method3, where the class of method2 is the parent class of the class of method1, and the class of method3 is the parent class of the class of method 2. When the query is executed, firstly, executing method1 to try to query, if the query fails, executing method2 to try to acquire data required by method1, and if the query of method2 succeeds, transmitting the queried data required by method1 into method1 to complete the query; if the query by method2 fails, method3 is further executed to obtain the data required by method2, when the query by method3 succeeds, the obtained data is transmitted to method2 to help the query to succeed, and then method2 transmits the obtained data to method1 to complete the final query.
Specifically, the traceable path may implement direct data query, for example, in an RPC-based query mode, the traceable path may include: the name of the RPC and the name of the method in the RPC. Therefore, a plurality of RPCs may be preconfigured in the above-mentioned traceability information, each RPC may include an RPC name, various attributes of the RPC such as an interface aggregation attribute, a consumer attribute, and the like, and each method in the RPC may include a method name, a method parameter type, a method parameter and the like, and these methods may be under the "method aggregation" attribute of the RPC.
In step S102, after the DTO and the obtained data management relationship are bound to form a service data object BO, the BO may be matched with a plurality of target scenes configured in advance, so as to obtain at least one target scene corresponding to the BO (that is, a target scene corresponding to source data in the BO). The specific steps of the target scene matching are as follows: traversing each target scene, and judging whether the DTO in BO meets the triggering condition of the target scene (cfg _ ts-m, namely judging whether the source data corresponding to the DTO meets the triggering condition); and determining at least one target scene with the judgment result of yes as a target scene corresponding to BO. In one embodiment, the target scene matching may be achieved by determining whether the aforementioned data features in the DTO satisfy the trigger condition. Through the above steps, one or more target scenes corresponding to BO may be obtained, or a target scene corresponding to BO may not be obtained, and in this case, the data processing flow may be ended.
Step S103: and for each target scene corresponding to the source data, acquiring required data of the target scene according to the required data identification information and/or the source data in the target scene, and outputting the target data generated based on the required data by using the destination information in the target scene.
In this step, for a BO and each target scene corresponding thereto, the following steps are performed: firstly, whether the required data (cfg _ ts-b, that is, whether the required data exists in the source data) of the target scene exists in the DTO in the BO is judged according to the identification information of the required data in the target scene: if yes, the required data is obtained from the DTO. After acquiring the required data from the DTO, if there still exists the data required by the target scene that is not acquired (that is, all the required data cannot be provided in the source data), the data required by the target scene that is not acquired is acquired according to the required data identification information in the target scene, the preconfigured data association relationship, and the preconfigured tracing information cfg _ bts, a process of acquiring the data required by the target scene that does not exist in the source data is tracing, and the data required by the target scene that is not acquired is tracing data bts _ data.
Specifically, for any data which is not acquired and is needed by the target scene, the following tracing steps are executed: firstly, traversing each pre-configured data association relationship, detecting whether the identification information of the required data corresponds to the name of the data association relationship, and determining the data association relationship with the detection result of yes as the specific data association relationship of the required data. And then, obtaining a tracing path from the pre-configured tracing information according to the query method identifier in the specific data association relationship, and obtaining the required data by using the tracing path.
For example, if the identification information of some required data that is not acquired by the target scene is a combination of a library name, a table name, and a field name, the library name and the table name are connected, and the connected data is matched with the name of each data association relationship. It can be understood that the configuration rule of the data association relation name is a combination of the corresponding library name and the table name. And if the matching is successful, determining the corresponding data association relationship as the specific data association relationship of the required data. Then, a tracing path can be obtained from the pre-configured tracing information by using the query method identifier in the specific data association relationship, and the required data is queried to complete the tracing of the required data.
As a preferred scheme, when a certain required data of a query target scenario is identified by using a query method in a specific data association relationship, a tracing path may be first obtained from tracing information according to an identification of a direct query method in the specific data association relationship, and a first query operation is performed using the tracing path to attempt to obtain the required data. If the query is successful, the tracing is completed; and if the query fails and the required data is not acquired, acquiring a tracing path from the tracing information according to the identifier of the parent query method in the specific data association relation, and acquiring the data required for executing the first query operation by using the tracing path. The parent query method may include a primary parent query method, a secondary parent query method … …, and finally, the first query operation is executed again according to the data obtained by the parent query method and needed for executing the first query operation, so as to obtain the needed data.
After all the required data of a certain target scene are obtained according to the above steps, target data TD may be generated based on the required data (possibly including the tracing data bts _ data), and the target data is the final output of the data processing platform. In practical application, the acquired data required by the target scene can be directly used as target data, the acquired data required by the target scene can replace corresponding placeholders in the content configuration cfg _ ts-b in the target scene to form target data, the acquired data required by the target scene can also be used for replacing corresponding placeholders in the content configuration cfg _ ts-b in the target scene and combining BO to form target data, and the process of generating the target data is assembly.
It will be appreciated that independent object data may be generated based on each object scene corresponding to the BO, which may be output according to the destination information (i.e., cfg _ ts-ec) in the corresponding object scene. For example, according to the configuration, the target data may be transmitted to a message queue, transmitted through RPC, and/or preset calculation logic may be performed. It can be seen that a target scene may have multiple data outlets, and if m target scenes correspond to a BO, each target scene has k (m and k are positive integers) data outlets, then the BO has m × k data outlets. Therefore, the embodiment of the invention can realize various data inlets and various data outlets, thereby overcoming the defect of the prior art that data directional transmission is performed based on a single data inlet and outlet.
Fig. 3 is a schematic configuration information diagram according to an embodiment of the present invention, which illustrates various configuration information of a data processing platform, including data association, a target scenario, and source tracing information. In specific application, the data association relationship may adopt a JSON format, the target scene may be realized by combining XML (eXte subframe Markup Language) and parsing engine syntax, and the traceability information may be realized by using storage classes such as JDBC, mybasic, Redis, Guava and the like, and RPC classes such as Dubbo, Motan, rpcX, gRPC and the like.
Fig. 4 is a schematic diagram of a data flow trajectory of the data processing method in the embodiment of the present invention, and illustrates steps of data source, deserialization, data association, scene matching, tracing, assembly, output, and the like that are sequentially performed in the embodiment of the present invention.
In the technical scheme of the embodiment of the invention, the stream data processing logic and the service logic are completely decoupled through the unique data association relationship, the configuration mode of the target scene and the source tracing information and the general stream data processing flow, so that the data can be traced through configuration and the system is maintained only by the data association relationship, and the data required by the target scene can be completed through configuration, thereby overcoming various defects brought by a hard coding mode in the prior stream computing technology and having higher availability, expansibility and flexibility.
The technical solution of the present invention will be further explained below according to a specific embodiment, which is implemented as follows:
step 1, starting point: the data processing platform receives the data source by calling RPC method, message queue and canal, the source data format can be JSON format, and the data source and other related data are needed to be included. The source data in one example is as follows:
Figure BDA0002934775900000131
Figure BDA0002934775900000141
Figure BDA0002934775900000151
the source data relates to data update (update), including the update of a record in the team _ member _3 table of the database dept _1, wherein the id field of the record is 126, the name field of the record is zhangsan, and the team _ id field is changed from yy1 to zk 2. The database name and the table name constitute data source information of the source data.
Step 2, analysis: the source data is parsed and deserialized into a data format recognizable by the platform for use in DTO.
Figure BDA0002934775900000152
In the above DTO, the field name is adjusted, and the last line of data is the data feature extracted from the source data.
And step 3, associating: and traversing all the pre-configured data association relations (cfg _ drs), sequentially matching the data association relations with the DTOs by using matching conditions in the data association relations to obtain the data association relations corresponding to the DTOs, and binding and packaging the data association relations and the DTOs into BOs. The data association relationship corresponding to the DTO is as follows.
Figure BDA0002934775900000161
In the above configuration, the deptdb.teammember is the name of the data association, and the matching condition is the regular expression "dept \ \ d +. team _ member \ \ d + (representing the format: dept _ number. team _ member _ number); the "field alias" attribute represents an alias mapping rule, and maps the original name "deptdb.temmber.partno" of a specific field to the alias "deptdb.temmber.partid"; the value of the "query method" attribute in the above configuration is a direct query method identifier, which includes an RPC name "teammemberbrc" and a method name "getTeamMemberByName" in RPC; the "parent query method" corresponds to the primary parent query method.
In this step, data source information of source data is first constructed as "depth _1.temp _ member _ 3" (i.e. library name and table name are connected in ". times."), then matched with matching conditions of data association relation, and finally matched with the data association relation successfully because of being in accordance with the regular expression in the data association relation, and then bound and stored in BO.
Step 4, matching the target scene: and acquiring all target scene configurations (cfg _ ts), traversing and sequentially matching with the current BO to acquire at least one target scene corresponding to the BO, and possibly not matching the target scene (at this time, ending the data processing flow). After matching to the target scene, binding the corresponding BO with the target scene and saving to the set List < BO + cfg _ ts >. One target scenario corresponding to BO is as follows:
Figure BDA0002934775900000171
Figure BDA0002934775900000181
in the above configuration, the match tag includes a trigger condition (i.e., cfg _ ts _ m) "# if ($ tools. field change ('deptdb. teammember. teamld'))", which indicates whether the change field in BO (i.e., the change field in DTO) is "deptdb. teammember. teamld"; equivalently, the following determination is performed for the source data: whether the source data is in a depth _ (followed by a certain number) library team _ member (followed by a certain number) table, and the change field is "team _ id"; wherein, "#" marks a default method (e.g., if), "$" marks a custom method (e.g., tools. fieldchange and get); the body tag is cfg _ ts _ b, wherein the deptdb.temmber.id and the deptdb.temmber.name are identification information of data required by the target scene; the exit _ channel tag corresponds to cfg _ ts _ ec, and includes the destination information "send message queue mq" of the target scene.
It can be understood that the above target scenario acts as: when the changed field in the BO is "deptdb.temmber.temjd", the data to be acquired are "deptdb.temmber.id" and "deptdb.temmber.name" (i.e., the id field value and the name field value of the record in which the change occurs).
Step 5, tracing the source: for each List < BO + cfg _ ts >, first, it is determined whether all the required data exists in the BO, and if so, tracing is not needed, and if not, tracing is needed. In the above example, the required data "deptdb.temmber.id" and "deptdb.temmber.name" are both present in the BO (126 and zhangsan, respectively) and therefore do not need to be traced. Assuming that the required data "deptdb. temmber. name" does not exist in the BO, the following tracing steps are performed.
First, a library name and a table name are extracted from the required data and connected in a ". to form" deptdb. temmber ", and the deptdb. temmber" is matched with the names of all the data association relations (cfg _ drs) to obtain a specific data association relation (in this example, the data association relation shown in step 3, in an actual application, the specific data association relation may not be the data association relation binding the DTO in step 3). And then, acquiring a direct query method identifier 'teammemberbyname' from the specific association relation, and determining a tracing path based on the following pre-configured tracing information (cfg _ bts).
Figure BDA0002934775900000182
Figure BDA0002934775900000191
After a direct query method identifier 'teammemberbyname' is obtained, an RPC name 'teammemberbyname' and a corresponding method name 'getTeamMemberByName' are determined, then the RPC name is used for determining the RPC and the attribute thereof in the tracing information, the corresponding method and the attribute thereof in the tracing information are determined by the method name, and thus the tracing path is obtained. And executing the tracing path to obtain data required by the target scene needing tracing. If the tracing is not completed through the direct query method identification, the parent query method identification can be sequentially used for querying according to the hierarchy to obtain data required by executing the direct query method, and finally the data required by the target scene to be traced can be obtained through tracing again through the direct query method identification based on the required data.
Step 6, data integration: and (5) acquiring required data aiming at each target scene corresponding to the BO according to the step 5.
And 7, assembling: and replacing the placeholders in the cfg _ ts-b with the obtained data required by the target scene. For example, when the value of deptdb.teammember.name is zhangsan, zhangsan is used to replace $ get ('deptdb.teammember.name') in cfg _ ts-b in step 4, and the content in cfg _ ts-b after replacement can be { "id": 0"," name ": zhangsan" }, and finally, target data TD is generated according to the data formed after replacement and BO, namely TD assembly is completed.
And 8, outputting: and outputting the target data TD according to the cfg _ ts-ec in the step 4, namely outputting the TD to a message queue.
In the above configuration, the default method is at # start, the custom method is at $ start, and the method ends with the left parenthesis (start, right parenthesis) immediately after that, and all the independent contents in the parenthesis are separated by the english comma as the input parameter condition for this method. The structure of the input ginseng of the method is in a standard format, and different parts in the input ginseng are connected by English point numbers, namely library names, table names and field names. When the method is used, the method can be split for use, for example, the library name, the table name and the field name can be matched with a data association relation, and the library name, the table name and the field name can be matched with a field and a value thereof.
In the technical scheme of the embodiment of the invention, a plurality of target scenes are configured in advance according to business logic, and each target scene comprises identification information of data required by the target scene and destination information used for representing a data output mode of the target scene. After receiving the source data, matching the source data with the target scenes to obtain one or more target scenes corresponding to the source data, acquiring required data of the target scenes according to the obtained identification information and/or the source data in the target scenes to generate target data, and finally outputting the target data by utilizing the destination information in the target scenes. Therefore, the decoupling of the data processing flow and the service logic is realized, and the worker can execute the general data processing flow (namely, the general data processing platform is used) to realize the service logic only by configuring the required data and the destination information of the target scene according to the service logic in advance, so that the high expansibility and flexibility of the data processing flow and the data processing platform are realized.
It should be noted that, for the convenience of description, the foregoing method embodiments are described as a series of acts, but those skilled in the art will appreciate that the present invention is not limited by the order of acts described, and that some steps may in fact be performed in other orders or concurrently. Moreover, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no acts or modules are necessarily required to implement the invention.
In order to better implement the above-mentioned solution of the embodiment of the present invention, the following also provides a related device, i.e. a data processing platform, for implementing the above-mentioned solution.
Referring to fig. 5, a data processing platform 500 according to an embodiment of the present invention may include: a source data receiving unit 501, a target scene matching unit 502 and a tracing and output unit 503.
The source data receiving unit 501 may be configured to receive source data to be processed; the target scene matching unit 502 may be configured to match the source data with a plurality of pre-configured target scenes to obtain at least one target scene corresponding to the source data; wherein, each target scene comprises: identification information of required data of the target scene and destination information used for representing a data output mode of the target scene; the tracing and output unit 503 may be configured to: for each target scene corresponding to the source data, acquiring the required data of the target scene according to the identification information and/or the source data in the target scene, and outputting the target data generated based on the required data by using the destination information in the target scene.
In an embodiment of the present invention, the data processing platform 500 may further include: an parsing and deserializing unit to: after receiving source data to be processed, analyzing and deserializing the source data to form a Data Transmission Object (DTO); the data association unit is used for matching the DTO with a plurality of pre-configured data association relations after the DTO is formed, obtaining the data association relation corresponding to the DTO, and binding the DTO with the obtained data management relation to form a business data object BO; the tracing and output unit 503 may be further configured to: judging whether the required data of the target scene exists in the DTO in the BO according to the identification information of the required data in the target scene: if yes, acquiring the required data from the DTO; after acquiring the required data from the DTO, if the data required by the target scene which is not acquired still exists, acquiring the data required by the target scene which is not acquired according to the identification information, the preconfigured data association relation and the preconfigured tracing information in the target scene.
Preferably, the source data includes data source information, and each data association relationship includes a matching condition; and the data association unit may be further configured to: traversing each pre-configured data association relationship, and judging whether the data source information in the DTO meets the matching condition in the data association relationship; and determining the data association relation with the judgment result of yes as the data association relation corresponding to the DTO.
As a preferred scheme, the data source information of the source data includes names of database tables, each data association relationship includes an alias mapping rule for a specific field in the database table, and the alias mapping rule represents a mapping relationship between an original name and an alias of the specific field; and the data association unit may be further configured to: after obtaining the data association relation corresponding to the DTO, if the DTO comprises the specific field data indicated by the obtained data association relation, generating alias field data corresponding to the specific field data according to an alias mapping rule in the data association relation, and encapsulating the alias field data in the DTO; wherein the alias field data corresponds to the field name of the specific field data based on the alias mapping rule, and the field values are the same.
In specific application, each target scene comprises: at least one trigger condition for the target scenario; and, the target scene matching unit 502 may be further configured to: traversing each target scene, and judging whether the DTO in the BO meets the triggering condition of the target scene; and determining at least one target scene with the judgment result of yes as a target scene corresponding to the source data.
In practical application, each data association relationship comprises at least one identifier of a query method; and, the tracing and output unit 503 may be further configured to: for any data required by the unacquired target scene: traversing each data association relation, and detecting whether the identification information of the required data corresponds to the name of the data association relation; determining the data association relation with the detection result of yes as the specific data association relation of the required data; and obtaining a tracing path from the tracing information according to the query method identifier in the specific data association relation, and obtaining the required data by using the tracing path.
In some embodiments, the query method may further include a direct query method and at least one hierarchical level of parent query methods; and, the tracing and output unit 503 may be further configured to: obtaining a tracing path from the tracing information according to the identifier of the direct query method in the specific data association relationship, and executing a first query operation by using the tracing path to try to obtain the required data; if the required data is not acquired, acquiring a tracing path from the tracing information according to the identifier of the parent query method in the specific data association relation, and acquiring data required for executing a first query operation by using the tracing path; and executing the first query operation again according to the required data so as to acquire the required data.
Furthermore, in the embodiment of the present invention, the source data receiving unit 501 may further be configured to: receiving source data to be processed by using a message queue, a Remote Procedure Call (RPC) or an incremental data synchronization tool; the destination information includes: sending the target data to a message queue, sending the target data through RPC, and/or executing a preset calculation logic.
In the technical scheme of the embodiment of the invention, a plurality of target scenes are configured in advance according to business logic, and each target scene comprises identification information of data required by the target scene and destination information used for representing a data output mode of the target scene. After receiving the source data, matching the source data with the target scenes to obtain one or more target scenes corresponding to the source data, acquiring required data of the target scenes according to the obtained identification information and/or the source data in the target scenes to generate target data, and finally outputting the target data by utilizing the destination information in the target scenes. Therefore, the decoupling of the data processing flow and the service logic is realized, and the worker can execute the general data processing flow (namely, the general data processing platform is used) to realize the service logic only by configuring the required data and the destination information of the target scene according to the service logic in advance, so that the high expansibility and flexibility of the data processing flow and the data processing platform are realized.
In a specific embodiment, in addition to pre-configuring a target scene, a corresponding data association relationship is pre-configured for each object storing source data, the data association relationship defines a query method identifier for performing data tracing and an alias mapping rule for a specific field in a database table, and pre-configures tracing information corresponding to the query method identifier and used for providing a tracing path. After receiving the source data, the following data processing flow may be performed with the data processing platform: firstly, source data is analyzed and serialized into a data transmission object DTO, then, a corresponding data association relation is obtained according to data source information in the source data and is bound to form a business data object BO, then, a target scene corresponding to the BO is obtained, and tracing is carried out according to required data identification information in the target scene. Specifically, whether required data of the target scene exists in source data is judged, and if the required data of the target scene exists, the required data is obtained from the source data; and then if the needed data which is not obtained still exists, determining a corresponding specific data association relation by using the needed data identification information in the target scene, obtaining a tracing path from the tracing information according to the query method identification in the specific data association relation so as to query the needed data which is not obtained, thereby completing the needed data of the target scene, and finally outputting the target data generated based on the needed data according to the destination information in the target scene. It can be seen that the data processing platform and the data processing flow are completely decoupled from the service logic, a worker can pre-configure the service logic such as data association relation, target scene, traceability information and the like related to the service, and then each step of the executed data processing flow is irrelevant to the service, so that the universality, expansibility and flexibility of the data processing platform can be improved, and when various service scenes or service scenes are changed, only the related configuration needs to be adjusted without changing the data processing platform, thereby realizing a high-availability general streaming data processing platform.
Fig. 6 illustrates an exemplary system architecture 600 of a data processing platform or data processing method to which embodiments of the present invention may be applied.
As shown in fig. 6, the system architecture 600 may include terminal devices 601, 602, 603, a network 604 and a server 605 (this architecture is merely an example, and the components included in a specific architecture may be adjusted according to the specific application). The network 604 serves to provide a medium for communication links between the terminal devices 601, 602, 603 and the server 605. Network 604 may include various types of connections, such as wire, wireless communication links, or fiber optic cables.
A user may use the terminal devices 601, 602, 603 to interact with the server 605 via the network 604 to receive or send messages or the like. The terminal devices 601, 602, 603 may have various client applications installed thereon, such as a streaming data processing application (for example only).
The terminal devices 601, 602, 603 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 605 may be a server providing various services, such as a data processing server providing support for streaming data processing applications operated by users with the terminal devices 601, 602, 603 (for example only). The data processing server may process the received data processing request and feed back the processing result (e.g., the processed target data — for example only) to the terminal device 601, 602, 603.
It should be noted that the data processing method provided by the embodiment of the present invention is generally executed by the server 605, and accordingly, the data processing platform is generally disposed in the server 605.
It should be understood that the number of terminal devices, networks, and servers in fig. 6 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
The invention also provides the electronic equipment. The electronic device of the embodiment of the invention comprises: one or more processors; and the storage device is used for storing one or more programs, and when the one or more programs are executed by the one or more processors, the one or more processors realize the data processing method provided by the invention.
Referring now to FIG. 7, shown is a block diagram of a computer system 700 suitable for use with the electronic device implementing an embodiment of the present invention. The electronic device shown in fig. 7 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.
As shown in fig. 7, the computer system 700 includes a Central Processing Unit (CPU)701, which can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)702 or a program loaded from a storage section 708 into a Random Access Memory (RAM) 703. In the RAM703, various programs and data necessary for the operation of the computer system 700 are also stored. The CPU701, the ROM 702, and the RAM703 are connected to each other via a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.
The following components are connected to the I/O interface 705: an input portion 706 including a keyboard, a mouse, and the like; an output section 707 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 708 including a hard disk and the like; and a communication section 709 including a network interface card such as a LAN card, a modem, or the like. The communication section 709 performs communication processing via a network such as the internet. A drive 710 is also connected to the I/O interface 705 as needed. A removable medium 711 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 710 as necessary, so that a computer program read out therefrom is mounted into the storage section 708 as necessary.
In particular, the processes described in the main step diagrams above may be implemented as computer software programs, according to embodiments of the present disclosure. For example, embodiments of the invention include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the main step diagram. In the above-described embodiment, the computer program can be downloaded and installed from a network through the communication section 709, and/or installed from the removable medium 711. The computer program, when executed by the central processing unit 701, performs the above-described functions defined in the system of the present invention.
It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present invention may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes a source data receiving unit, a target scene matching unit, and a tracing and output unit. Where the names of these units do not in some cases constitute a limitation on the unit itself, for example, the source data receiving unit may also be described as a "unit providing source data to the target scene matching unit".
As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs which, when executed by the apparatus, cause the apparatus to perform steps comprising: receiving source data to be processed; matching the source data with a plurality of target scenes which are configured in advance to obtain at least one target scene corresponding to the source data; wherein, each target scene comprises: identification information of required data of the target scene and destination information used for representing a data output mode of the target scene; for each target scene corresponding to the source data, acquiring the required data of the target scene according to the identification information and/or the source data in the target scene, and outputting the target data generated based on the required data by using the destination information in the target scene.
In the technical scheme of the embodiment of the invention, a plurality of target scenes are configured in advance according to business logic, and each target scene comprises identification information of data required by the target scene and destination information used for representing a data output mode of the target scene. After receiving the source data, matching the source data with the target scenes to obtain one or more target scenes corresponding to the source data, acquiring required data of the target scenes according to the obtained identification information and/or the source data in the target scenes to generate target data, and finally outputting the target data by utilizing the destination information in the target scenes. Therefore, the decoupling of the data processing flow and the service logic is realized, and the worker can execute the general data processing flow (namely, the general data processing platform is used) to realize the service logic only by configuring the required data and the destination information of the target scene according to the service logic in advance, so that the high expansibility and flexibility of the data processing flow and the data processing platform are realized.
The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (14)

1. A data processing method, comprising:
receiving source data to be processed;
matching the source data with a plurality of target scenes which are configured in advance to obtain at least one target scene corresponding to the source data; wherein, each target scene comprises: identification information of required data of the target scene and destination information used for representing a data output mode of the target scene;
for each target scene corresponding to the source data, acquiring the required data of the target scene according to the identification information and/or the source data in the target scene, and outputting the target data generated based on the required data by using the destination information in the target scene.
2. The data processing method of claim 1, wherein the method further comprises:
after receiving source data to be processed, analyzing and deserializing the source data to form a data transmission object DTO.
3. The data processing method of claim 2, wherein the method further comprises:
after the DTO is formed, matching the DTO with a plurality of pre-configured data association relations to obtain a data association relation corresponding to the DTO, and binding the DTO with the obtained data management relation to form a business data object BO.
4. The data processing method of claim 3, wherein the source data includes data source information, and each data association relationship includes a matching condition; and matching the DTO with a plurality of pre-configured data association relations to obtain the data association relation corresponding to the DTO, including:
traversing each pre-configured data association relationship, and judging whether the data source information in the DTO meets the matching condition in the data association relationship;
and determining the data association relation with the judgment result of yes as the data association relation corresponding to the DTO.
5. The data processing method according to claim 4, wherein the data source information of the source data includes names of database tables, and each data association relationship includes an alias mapping rule for a specific field in the database table, and the alias mapping rule represents a mapping relationship between an original name and an alias of the specific field; and, the method further comprises:
after obtaining the data association relation corresponding to the DTO, if the DTO comprises the specific field data indicated by the obtained data association relation, generating alias field data corresponding to the specific field data according to an alias mapping rule in the data association relation, and encapsulating the alias field data in the DTO; wherein the alias field data corresponds to the field name of the specific field data based on the alias mapping rule, and the field values are the same.
6. A data processing method according to claim 3, wherein each target scene comprises: at least one trigger condition for the target scenario; and matching the source data with a plurality of target scenes configured in advance to obtain at least one target scene corresponding to the source data, including:
traversing each target scene, and judging whether the DTO in the BO meets the triggering condition of the target scene;
and determining at least one target scene with the judgment result of yes as a target scene corresponding to the source data.
7. The data processing method according to claim 3, wherein the obtaining the required data of each target scene corresponding to the source data according to the identification information in the target scene and/or the source data comprises:
judging whether the required data of the target scene exists in the DTO in the BO according to the identification information of the required data in the target scene: if yes, acquiring the required data from the DTO;
after acquiring the required data from the DTO, if the data required by the target scene which is not acquired still exists, acquiring the data required by the target scene which is not acquired according to the identification information, the preconfigured data association relation and the preconfigured tracing information in the target scene.
8. The data processing method of claim 7, wherein each data association includes an identifier of at least one query method; and acquiring the data required by the unacquired target scene according to the identification information, the preconfigured data association relationship and the preconfigured tracing information in the target scene, including:
for any data required by the unacquired target scene:
traversing each data association relation, and detecting whether the identification information of the required data corresponds to the name of the data association relation; determining the data association relation with the detection result of yes as the specific data association relation of the required data;
and obtaining a tracing path from the tracing information according to the query method identifier in the specific data association relation, and obtaining the required data by using the tracing path.
9. The data processing method of claim 8, wherein the query method further comprises a direct query method and at least one hierarchical level of parent query methods; and obtaining a tracing path from the tracing information according to the query method identifier in the specific data association relation, and obtaining the required data by using the tracing path, wherein the steps of:
obtaining a tracing path from the tracing information according to the identifier of the direct query method in the specific data association relationship, and executing a first query operation by using the tracing path to try to obtain the required data;
if the required data is not acquired, acquiring a tracing path from the tracing information according to the identifier of the parent query method in the specific data association relation, and acquiring data required for executing a first query operation by using the tracing path;
and executing the first query operation again according to the required data so as to acquire the required data.
10. The data processing method according to any one of claims 1 to 9, wherein the receiving source data to be processed comprises: receiving source data to be processed by using a message queue, a Remote Procedure Call (RPC) or an incremental data synchronization tool;
the destination information includes: sending the target data to a message queue, sending the target data through RPC, and/or executing a preset calculation logic.
11. A data processing platform, comprising:
the source data receiving unit is used for receiving source data to be processed;
the target scene matching unit is used for matching the source data with a plurality of preset target scenes to obtain at least one target scene corresponding to the source data; wherein, each target scene comprises: identification information of required data of the target scene and destination information used for representing a data output mode of the target scene;
the source tracing and output unit is used for: for each target scene corresponding to the source data, acquiring the required data of the target scene according to the identification information and/or the source data in the target scene, and outputting the target data generated based on the required data by using the destination information in the target scene.
12. The data processing platform of claim 11, wherein the data processing platform further comprises:
an parsing and deserializing unit to: after receiving source data to be processed, analyzing and deserializing the source data to form a Data Transmission Object (DTO);
the data association unit is used for matching the DTO with a plurality of pre-configured data association relations after the DTO is formed, obtaining the data association relation corresponding to the DTO, and binding the DTO with the obtained data management relation to form a business data object BO;
the tracing and output unit is further used for: judging whether the required data of the target scene exists in the DTO in the BO according to the identification information of the required data in the target scene: if yes, acquiring the required data from the DTO; after acquiring the required data from the DTO, if the data required by the target scene which is not acquired still exists, acquiring the data required by the target scene which is not acquired according to the identification information, the preconfigured data association relation and the preconfigured tracing information in the target scene.
13. An electronic device, comprising:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement a data processing method as claimed in any one of claims 1-10.
14. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the data processing method of any one of claims 1 to 10.
CN202110156169.6A 2021-02-04 2021-02-04 Data processing method and data processing platform Pending CN113760987A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110156169.6A CN113760987A (en) 2021-02-04 2021-02-04 Data processing method and data processing platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110156169.6A CN113760987A (en) 2021-02-04 2021-02-04 Data processing method and data processing platform

Publications (1)

Publication Number Publication Date
CN113760987A true CN113760987A (en) 2021-12-07

Family

ID=78786545

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110156169.6A Pending CN113760987A (en) 2021-02-04 2021-02-04 Data processing method and data processing platform

Country Status (1)

Country Link
CN (1) CN113760987A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114780584A (en) * 2022-06-22 2022-07-22 云账户技术(天津)有限公司 Multi-scene streaming data processing method, system, network equipment and storage medium
CN117033449A (en) * 2023-10-09 2023-11-10 北京中科闻歌科技股份有限公司 Data processing method based on kafka stream, electronic equipment and storage medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114780584A (en) * 2022-06-22 2022-07-22 云账户技术(天津)有限公司 Multi-scene streaming data processing method, system, network equipment and storage medium
CN117033449A (en) * 2023-10-09 2023-11-10 北京中科闻歌科技股份有限公司 Data processing method based on kafka stream, electronic equipment and storage medium
CN117033449B (en) * 2023-10-09 2023-12-15 北京中科闻歌科技股份有限公司 Data processing method based on kafka stream, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
ES2765415T3 (en) Microservices-based data processing apparatus, method and program
US9442822B2 (en) Providing a visual representation of a sub-set of a visual program
CN106897153B (en) Method and system for calling application programming interface
CN113760987A (en) Data processing method and data processing platform
CN110555030A (en) SQL statement processing method and device
CN111666293A (en) Database access method and device
US11263542B2 (en) Technologies for auto discover and connect to a rest interface
CN110858202A (en) Method and device for generating where clause in database query statement
US8782470B2 (en) Generation of test data for web service-based test automation and semi-automated test data categorization
CN113760948A (en) Data query method and device
US7237222B1 (en) Protocol for controlling an execution process on a destination computer from a source computer
US11552868B1 (en) Collect and forward
CN113760961A (en) Data query method and device
CN113536748A (en) Method and device for generating chart data
US8549090B2 (en) Messaging tracking system and method
US20170139758A1 (en) Nondeterministic Operation Execution Environment Utilizing Resource Registry
US20140215011A1 (en) Message exchange via generic tlv generator and parser
CN114996554A (en) Database query method and device, storage medium and electronic equipment
CN113946816A (en) Cloud service-based authentication method and device, electronic equipment and storage medium
CN113157722A (en) Data processing method, device, server, system and storage medium
CN112306984A (en) Data source routing method and device
CN113032004A (en) Method, apparatus and program product for managing development jobs in a development environment
WO2004104865A2 (en) Methods and systems for intellectual capital sharing and control
EP4105782A1 (en) Hardware accelerator service discovery
CN117648212B (en) RPC-based database calling method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination