CN118260351A - Data processing method, device, equipment and storage medium based on stream processing - Google Patents

Data processing method, device, equipment and storage medium based on stream processing Download PDF

Info

Publication number
CN118260351A
CN118260351A CN202410435679.0A CN202410435679A CN118260351A CN 118260351 A CN118260351 A CN 118260351A CN 202410435679 A CN202410435679 A CN 202410435679A CN 118260351 A CN118260351 A CN 118260351A
Authority
CN
China
Prior art keywords
processed
data
service
rule
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410435679.0A
Other languages
Chinese (zh)
Inventor
谢建波
陈琳
陈帆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Construction Bank Corp
CCB Finetech Co Ltd
Original Assignee
China Construction Bank Corp
CCB Finetech Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Construction Bank Corp, CCB Finetech Co Ltd filed Critical China Construction Bank Corp
Priority to CN202410435679.0A priority Critical patent/CN118260351A/en
Publication of CN118260351A publication Critical patent/CN118260351A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3867Concurrent instruction execution, e.g. pipeline or look ahead using instruction pipelines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Finance (AREA)
  • Databases & Information Systems (AREA)
  • Strategic Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Game Theory and Decision Science (AREA)
  • Data Mining & Analysis (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a data processing method, a device, equipment and a storage medium based on stream processing. Relates to the technical field of big data. The method comprises the following steps: obtaining stream data to be processed generated when a target user completes the process of a target service; invoking at least one rule flow which is preset and corresponds to the target service; the method comprises the steps of obtaining a task to be processed after splicing the streaming data to be processed and at least one rule stream, and broadcasting the task to be processed to all parallel threads so as to process the task to be processed based on a target parallel thread; the target parallel threads are threads in all parallel threads; and determining standard reaching information of the target service based on the obtained processing result. The invention aims at a large amount of streaming data to be processed, and achieves the effect of improving the efficiency of data processing.

Description

Data processing method, device, equipment and storage medium based on stream processing
Technical Field
The present invention relates to the field of big data technologies, and in particular, to a data processing method, apparatus, device, and storage medium based on stream processing.
Background
Currently, large commercial groups are gradually enriching and expanding respective internet terminals, and each internet terminal can be involved in developing multiple marketing campaigns for users to participate.
Currently, data processing for marketing campaigns mainly includes batch processing and real-time processing. The batch processing operation is generally T+2 delay time, certain hysteresis exists, and the real-time processing has a certain real-time advantage compared with the batch processing operation, but the batch processing operation and the real-time processing operation have the technical problem of lower efficiency in the scene of huge data volume generated by marketing activities.
Disclosure of Invention
The invention provides a data processing method, a device, equipment and a storage medium based on stream processing, which achieve the technical effect of improving the data processing efficiency.
In a first aspect, the present invention provides a data processing method based on stream processing, including:
Obtaining stream data to be processed generated when a target user completes the process of a target service;
Invoking at least one rule flow which is preset and corresponds to the target service;
The to-be-processed streaming data and the at least one rule stream are spliced to obtain to-be-processed tasks, and the to-be-processed tasks are broadcasted to all parallel threads so as to process the to-be-processed tasks based on target parallel threads; the target parallel threads are threads in all parallel threads;
and determining the standard reaching information of the target service based on the obtained processing result.
In a second aspect, the present invention provides a data processing apparatus based on stream processing, comprising:
the data acquisition module is used for acquiring streaming data to be processed, which is generated when a target user completes the process of a target service;
the rule acquisition module is used for calling at least one rule flow which is preset and corresponds to the target service;
The data processing module is used for obtaining a task to be processed after splicing the streaming data to be processed and the at least one rule stream, and broadcasting the task to be processed to all parallel threads so as to process the task to be processed based on a target parallel thread; the target parallel threads are threads in all parallel threads;
and the standard reaching information determining module is used for determining standard reaching information of the target service based on the obtained processing result.
In a third aspect, an embodiment of the present invention further provides an electronic device, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where the processor implements a data processing method based on stream processing according to any one of the embodiments of the present invention when the processor executes the program.
In a fourth aspect, an embodiment of the present invention further provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor, implements a data processing method based on stream processing according to any one of the embodiments of the present invention.
In a fifth aspect, embodiments of the present invention further provide a computer program product comprising a computer program which, when executed by a processor, implements a data processing method based on stream processing according to any of the embodiments of the present invention.
The data processing method, the device, the equipment and the storage medium based on the stream processing provided by the embodiment acquire stream data to be processed, which is generated when a target user completes the process of a target service; invoking at least one rule flow which is preset and corresponds to the target service; the to-be-processed streaming data and the at least one rule stream are spliced to obtain to-be-processed tasks, and the to-be-processed tasks are broadcasted to all parallel threads so as to process the to-be-processed tasks based on target parallel threads; the target parallel threads are threads in all parallel threads; according to the technical scheme of the embodiment of the invention, the effect of improving the efficiency of data processing on a large amount of streaming data to be processed generated by a target user when the process of the target service is completed is achieved by determining the task to be processed corresponding to the streaming data to be processed, which is generated by the target user when the process of the target service is completed, broadcasting the task to all parallel threads and processing the task to be processed based on the determined target and threads.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of an overall architecture of a data processing method based on stream processing according to an embodiment of the present invention;
FIG. 2 is a flowchart of a data processing method based on stream processing according to an embodiment of the present invention;
FIG. 3 is a flowchart of a data processing method based on stream processing according to an embodiment of the present invention;
FIG. 4 is a flowchart of a data processing method based on stream processing according to an embodiment of the present invention;
FIG. 5 is a flowchart of a data processing method based on stream processing according to an embodiment of the present invention;
FIG. 6 is a flowchart illustrating a data processing method based on stream processing according to an embodiment of the present invention;
FIG. 7 is a schematic diagram of a data processing apparatus based on stream processing according to an embodiment of the present invention;
fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The invention is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting thereof. It should be further noted that, for convenience of description, only some, but not all of the structures related to the present invention are shown in the drawings.
It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures. Meanwhile, in the description of the present application, the terms "first", "second", and the like are used only to distinguish the description, and are not to be construed as indicating or implying relative importance. The technical scheme of the application obtains, stores, uses, processes and the like the data, which all meet the relevant regulations of national laws and regulations. It should be noted that, in the embodiments of the present application, some existing solutions in the industry such as software, organization, model, etc. may be mentioned, and they should be regarded as exemplary only for illustrating the feasibility of implementing the technical solution of the present application, but it does not mean that the applicant has or must not use the solution.
Fig. 1 is a schematic diagram of an overall architecture of a data processing method based on stream processing according to an embodiment of the present invention. As shown in fig. 1, the overall architecture of the stream processing-based data processing method may include: a data lake, kafka, a real-time stream processing engine, a second database, and a third database.
The data lake may include all data generated by the user when the target platform completes the business process, and is used for providing the required data to the real-time stream processing engine. Kafka is understood to be a messaging middleware for communicating and/or storing data. A real-time stream processing engine may be understood as an engine for real-time data processing. The real-time stream processing engine may include a Flink, which may be understood as a stream processing framework, and a Guava, which may be understood as a local database, that is, the present invention performs real-time data processing by combining the real-time stream processing engines determined by Flink and Guava. The second database and the third database may be understood as storage databases, the roles of the second database and the third database may be different, they may be respectively used for user tracing and engine calling, the second database and the third database may be respectively Hbase and Kafka, in the present invention, the second database is optionally Hbase, and is used for tracing the data processing result by the user, the third database is Kafka, and is used for notifying the data reaching the standard result to the downstream task center component.
The open source technology adopted by the invention can comprise Kafka, flink, hbase, guava and other bottom layer components. Specifically, the method can send the standardized data of the subscribed data lake to kafka, perform real-time data processing based on a real-time stream processing engine, and finally transmit the result of the data processing to the second database and the third database in real time.
Based on the above general architecture, fig. 2 is a flowchart of a data processing method based on stream processing according to an embodiment of the present invention. The present embodiment is applicable to a case where real-time data generated when a user processes a service is data-processed based on a rule configured in advance, and the method may be performed by a data processing apparatus based on streaming, which may be implemented in the form of hardware and/or software. As shown in fig. 2, the data processing method includes:
s110, obtaining stream data to be processed, which is generated when a target user completes the process of the target service.
Wherein the target user may be understood as a user handling the target service.
A target service may be understood as a service that is handled and completed by a target user. In the embodiment of the present invention, the target service may be preset according to the scene requirement, which is not specifically limited herein. Alternatively, the target business may comprise a marketing campaign compliance business in the target platform. The marketing campaign qualifying services may include a submit service or a trigger service, etc. The submitting business may be a business that submits a specific object to the target platform, and, illustratively, the specific object may be a financial product, and the submitting business may be a financial product that is submitted to the target platform. The trigger business may be a business that triggers a page of the target platform, illustratively, a trigger business clicking into a participating interface of the marketing campaign.
The streaming data to be processed may be understood as streaming data generated when the process of the target service is completed. In the embodiment of the present invention, alternatively, the streaming data to be processed may be streaming data acquired in real time through kafka, specifically, data in a data lake is transmitted to kafka, and then the streaming data to be processed is acquired through kafka.
S120, at least one rule flow which is pre-configured and corresponds to the target service is invoked.
The rule flow may be understood as a processing rule corresponding to the target service. Alternatively, the rule flow may be a business rule corresponding to the current target business acquired through kafka. In the embodiment of the present invention, there may be multiple target services completed by the target user, for example, service 1, service 2, service 3, etc., and in the process of completing the target service, the generated streaming data to be processed may be attached with a service identifier (for example, identifier 1 corresponding to service 1) corresponding to the target service (for example, service 1).
In the context of service 1, identifier 1 characterizes the streaming data to be processed as the data stream generated during the completion of service 1.
In the embodiment of the present invention, a target service may correspond to one rule flow, or may correspond to a plurality of rule flows, and each rule flow may also be attached with a rule identifier associated with a service identifier of the target service (it may be understood that, in the preconfiguring process, the service identifier of the target service may be associated with the rule identifier of the rule flow), where the rule identifier characterizes which service the rule flow is preconfigured for.
For example, the rule flow includes rule 1, rule 2, rule 3, and rule 4, where rule 1 and rule 2 are preconfigured for service 1, with identifications being identification 4 and identification 5, respectively; rule 3 is preconfigured for service 2, identified as identifier 6, (it will be appreciated that during the preconfiguration, identifier 1 has been associated with identifier 4 and identifier 5, respectively, and identifier 2 has been associated with identifier 6).
Specifically, in the scenario that the target service is service 1, the identifier 4 and the identifier 5 associated with the identifier 1 of the service 1 are determined, and two rule flows (i.e., rule 1 and rule 2) corresponding to the target service (i.e., service 1) are invoked in advance based on the identifier 4 and the identifier 5.
In the scenario where the target service is service 2, an identifier 6 associated with identifier 2 of service 2 is determined, and a pre-configured rule flow (i.e., rule 3) corresponding to the target service (i.e., service 2) is invoked based on the identifier 6.
The invention discloses a mode for generating the rule flow and an association relation between rule logic of the generated rule flow and activity information of a marketing campaign standard service under the condition that a target service is the marketing campaign standard service.
On the basis of the embodiment, the at least one rule flow is generated based on configuration rules under at least one rule logic, and the at least one rule logic is matched with preset activity information of the marketing campaign compliance service.
Rule logic may be understood as logic for business rules to process streaming data to be processed. In embodiments of the present invention, rule logic may be encoded by the development end. At least one rule logic is matched with the preset activity information of the marketing activity standard reaching service, which can mean that when the marketing activity standard reaching service is developed, the set rule logic is matched with the activity information of the marketing activity standard reaching service, different marketing activity standard reaching services can correspond to different rule logic, and in the data processing process, the rule flow of the rule logic matched with the processed marketing activity standard reaching service is called.
In the embodiment of the invention, the configuration rule can be understood as a business rule configured by a business person on a business rule configuration interface generated based on rule logic. Optionally, the configuration rule may include a decision threshold of the service to be processed, an effective processing time limit of the service to be processed, an effective processing period of the service to be processed, and the like.
In the embodiment of the invention, the new addition of the service to be processed of the existing rule logic and the modification of the service rule (such as name, threshold or period) of the service to be processed can be directly realized based on the operation of the service personnel in the visual service rule configuration interface, codes are not required to be modified, and the convenience of the new addition and modification of the service of the existing rule logic is improved.
S130, after splicing the streaming data to be processed and the at least one rule stream, obtaining a task to be processed, and broadcasting the task to be processed to all parallel threads so as to process the task to be processed based on a target parallel thread; wherein the target parallel thread is a thread of all parallel threads.
The task to be processed may be understood as a task for processing streaming data to be processed based on a business rule of a rule stream. In the embodiment of the invention, the task to be processed can be a standard-reaching judging task, namely, a task for judging whether the target service completed by the target user corresponding to the stream data to be processed reaches the standard or not based on the rule stream corresponding to the standard-reaching judging rule.
In an embodiment of the present invention, the parallel threads may be parallel threads in flink frames. Specifically, after determining the task to be processed, a target parallel thread which is not in task processing in a plurality of parallel threads can be determined by broadcasting the task to be processed to all the parallel threads, and the task to be processed is transmitted to the target parallel thread so as to process the current task to be processed based on the target parallel thread. Based on the technical scheme of the embodiment, the task processing efficiency can be effectively improved.
And S140, determining the standard reaching information of the target service based on the obtained processing result.
The processing result may be a processing result of the target parallel thread to the task to be processed. Alternatively, the processing result may be a result that characterizes whether the streaming data to be processed in the task to be processed satisfies the business rule of the rule stream. Illustratively, the processing result may be a value of 1 that characterizes the traffic rule of the rule flow being satisfied by the streaming data to be processed in the task to be processed, and/or a value of 0 that characterizes the traffic rule of the rule flow being not satisfied by the streaming data to be processed in the task to be processed.
The standard reaching information may be understood as information representing whether the target service reaches the standard. Alternatively, the standard reaching information may be that the target service reaches the standard and/or the target service does not reach the standard, or the like.
Specifically, when the processing result is 1, determining that the standard reaching information of the target service is standard reaching; and when the processing result is 0, determining that the standard reaching information of the target service is not standard reaching.
The data processing method, the device, the equipment and the storage medium based on the stream processing provided by the embodiment acquire stream data to be processed, which is generated when a target user completes the process of a target service; invoking at least one rule flow which is preset and corresponds to the target service; the to-be-processed streaming data and the at least one rule stream are spliced to obtain to-be-processed tasks, and the to-be-processed tasks are broadcasted to all parallel threads so as to process the to-be-processed tasks based on target parallel threads; the target parallel threads are threads in all parallel threads; according to the technical scheme of the embodiment of the invention, the effect of improving the efficiency of data processing on a large amount of streaming data to be processed generated by a target user when the process of the target service is completed is achieved by determining the task to be processed corresponding to the streaming data to be processed, which is generated by the target user when the process of the target service is completed, broadcasting the task to all parallel threads and processing the task to be processed based on the determined target and threads.
On the basis of the embodiment, the original streaming data corresponding to the at least one service to be processed can be obtained from the data lake based on at least one pre-subscribed theme; the target service is a service in the service to be processed, when the condition that the preset condition is met is detected, the original streaming data is obtained and used as the streaming data to be processed, and the classified obtaining and the normalized processing of the streaming data are realized. Fig. 3 is a flowchart of a data processing method based on stream processing according to an embodiment of the present invention. As shown in fig. 3, the data processing method based on stream processing includes the following steps:
S210, acquiring original streaming data corresponding to at least one service to be processed from a data lake based on at least one topic subscribed in advance; wherein the target service is a service in the service to be processed.
The data lake can be used for storing all data generated based on processing operation of the task to be processed by a user on the target platform. In the embodiment of the present invention, the target platform may be set according to the scene requirement, which is not specifically limited herein. A target platform may include a plurality of pending traffic. Each service to be processed may produce a class of streaming data.
The service to be processed may be a service completing a specific event. The specific event may include an event that submits a specific object (e.g., completes a specific payment) or an event that triggers a page (e.g., clicks on a participation interface of a marketing campaign), etc.
Wherein the raw streaming data may be understood as event data generated based on the operational behaviour of the service to be processed.
In the embodiment of the invention, one type of streaming data generated by one service to be processed corresponds to one theme.
The service to be processed includes service a, service B, service C, etc., the streaming data generated by service a may be class a streaming data, the streaming data generated by service B may be class B streaming data, the streaming data generated by service C may be class C streaming data, the theme may include theme a, theme B, theme C, etc., the class a streaming data corresponds to theme a, the class B streaming data corresponds to theme B, and the class C streaming data corresponds to theme C.
In the above scenario, in particular, in the case where the original streaming data of the service a (i.e., class a streaming data) needs to be acquired, the topic a needs to be subscribed in advance.
Further, the original streaming data of the pre-subscribed theme can be acquired through kafka to realize the data processing of the service to be processed.
Raw streaming data may be understood as being event data generated based on the operational behaviour of the service to be processed, i.e. raw streaming data acquired based on the data lake.
Based on the technical scheme of the embodiment, aiming at each service to be processed, the streaming data of the same type can be acquired in a targeted manner based on the pre-subscribed theme, so that the classification processing of the streaming data of the service can be further realized, and the data processing can be effectively improved.
And S220, when the fact that the preset condition is met is detected, acquiring the original streaming data as the streaming data to be processed.
In the embodiment of the invention, the streaming data is required to be converted into the target format which can be identified, and then the data processing is performed. The original format corresponding to the obtained original streaming data may be the same as or different from the target format. Therefore, the acquired original streaming data needs to be detected, and when the original streaming data meets a preset condition, the original streaming data is used as streaming data to be processed so as to further process the data; and when the original streaming data does not meet the preset condition, format conversion can be performed on the original streaming data.
The preset condition may include a condition for determining whether the original streaming data is in the target format.
The original format corresponding to the original streaming data may be json format or other different formats. The target format may be a java bean entity.
Based on the technical scheme of the embodiment, the original streaming data in various unrecognizable data formats is converted into the streaming data to be processed in the identifiable unified data format.
S230, obtaining stream data to be processed, which is generated when the target user completes the process of the target service.
S240, at least one rule flow which is pre-configured and corresponds to the target service is invoked.
S250, after splicing the streaming data to be processed and the at least one rule stream, obtaining a task to be processed, and broadcasting the task to be processed to all parallel threads so as to process the task to be processed based on a target parallel thread; wherein the target parallel thread is a thread of all parallel threads.
The invention also discloses a method for determining the service rule corresponding to the new service and/or the service update rule when detecting the operation of the new service and/or changing the service update rule corresponding to the service to be processed. Optionally, on the basis of the foregoing embodiment, the data processing method further includes:
When detecting the operation of the new service and/or changing the service updating rule corresponding to the service to be processed, determining the existing service and the corresponding service rule corresponding to the new service and/or the service updating rule so as to determine the service rule to be used;
And determining the service rule corresponding to the newly added service and/or the service updating rule based on the service rule to be used.
The operation of adding a service may be understood as an operation of adding one or more services to be processed.
The operation of changing the service update rule may be understood as an operation of changing a service rule corresponding to the service to be processed, where the service rule corresponds to a rule flow.
The business rule to be used is understood as a pre-packaged business rule. In the embodiment of the invention, the service rule to be used can be directly obtained and used based on the operation of adding the service and/or changing the service updating rule corresponding to the service to be processed.
In the embodiment of the invention, the determined service rule to be used can be a service rule with similar service processing logic with operation, and the new addition of the service to be processed and the update of the service rule can be realized only by adjusting the relevant configuration file on the basis of the acquired service rule to be used.
Besides, the invention also provides convenient expansion for the Function < OUT > of the externally-opened custom Function interface, the interface is automatically loaded through Java SPI, and a developer can inherit the interface to realize the custom Function, thereby improving the convenience of service development.
Based on the technical scheme of the embodiment, the new addition of various services to be processed and the update of the service rules can be realized, and the configuration file is simply changed without modifying any judging engine code, so that the service development efficiency is effectively improved.
And S260, determining the standard reaching information of the target service based on the obtained processing result.
The data processing method, the device, the equipment and the storage medium based on the stream processing provided by the embodiment acquire original stream data corresponding to at least one service to be processed from a data lake based on at least one topic subscribed in advance; wherein the target service is a service in the service to be processed; when the condition that the original streaming data meets the preset condition is detected, the original streaming data is obtained as the streaming data to be processed, the problems that the unified processing efficiency of the multi-class streaming data is low, the data format of the obtained original streaming data is not unified and can not be identified are solved, the classification processing of the streaming data of the service is achieved, the data processing efficiency is effectively improved, the data format of the streaming data to be processed is effectively unified, and the technical effect of the identifiability of the streaming data to be processed is ensured.
Based on the above embodiment, before data processing, the streaming data to be processed may be deserialized to obtain a first entity object, and according to a data type corresponding to at least one rule stream, an object to be spliced corresponding to at least one rule stream is determined, and based on the first entity object and the object to be spliced, a task to be processed is determined, so as to implement the processing of the task to be processed, and accordingly, the present invention proposes the following embodiments: fig. 4 is a flowchart of a data processing method based on stream processing according to an embodiment of the present invention. As shown in fig. 4, the data processing method based on stream processing includes the following steps:
S310, obtaining stream data to be processed, which is generated when a target user completes the process of the target service.
S320, at least one rule flow which is pre-configured and corresponds to the target service is invoked.
Typically, there is timeliness of both the developed service, and the service rules configured for that service, i.e., the service existence end time. Thus, at the end of timeliness of a service, the rule flow configured for it is regarded as an expired rule flow, and a method for cleaning up the expired rule flow is disclosed. Optionally, on the basis of the foregoing embodiment, the data processing method further includes:
Cleaning a preset expired rule stream corresponding to a service to be selected based on a deployed objective function, and storing the cleaned expired rule stream into a permanent cache database;
And cleaning the expired rule stream to form a data snapshot, and storing the data snapshot to a distributed file system.
Wherein the data snapshot is the content after the expired rule stream is cleared. Alternatively, the location where the data snapshot is saved is in a distributed file system (e.g., hdfs) that functions similarly to a single host computer, the stored data snapshot can be used to read and resume regular operation when the job crashes and restarts.
In an embodiment of the present invention, all business rules (including the expired rule stream that expired to clear) are maintained in a separate persistent cache database (e.g., hbase) for traceability.
The objective function may be understood as a function having a function of cleaning the expired rule stream at regular time. In the embodiment of the present invention, the objective function may be preset according to the scene requirement, which is not specifically limited herein. Specifically, flush messages are triggered periodically based on the deployed objective function to clean up the expired rule stream periodically.
The service to be selected can be understood as a service which can be selected and processed on the target platform. In the embodiment of the invention, the service to be selected can comprise a plurality of services. Alternatively, the service to be selected may be a plurality of marketing campaign compliance services. By way of example, the services to be selected may include a commit service and/or a trigger service, etc. The submitting business may be a business that submits a specific object to the target platform, and, illustratively, the specific object may be a financial product, and the submitting business may be a financial product that is submitted to the target platform. The trigger business may be a business that triggers a page of the target platform, illustratively, a trigger business clicking into a participating interface of the marketing campaign.
An expired rule flow may be understood as an expired rule flow. In the embodiment of the present invention, each rule flow may include a processing time limit, and the processing time limit may be understood as a time limit for processing the corresponding service to be processed based on the rule flow. When the processing time limit of the rule flow expires, the rule flow is regarded as an expired rule flow corresponding to the service to be processed.
Based on the technical scheme of the embodiment, automatic timed cleaning of the expired rule flow can be realized, the cleaned expired rule flow is stored in the cache database, the user or manager can trace the expired rule flow, the data snapshot formed after the expired rule flow is cleaned is cached in the distributed file system, and the rule operation can be read and restored when the operation crashes and restarts.
S330, performing deserialization processing on the streaming data to be processed to obtain a first entity object, and determining an object to be spliced corresponding to the at least one rule stream according to the data type corresponding to the at least one rule stream.
The inverse serialization process may be understood as a process capable of disturbing and re-partitioning the streaming data to be processed, i.e. converting the streaming data to be processed into streaming data in a recognizable format in the data processing process.
The first entity object may be understood as an entity object obtained by inverse serialization processing of streaming data to be processed. In the embodiment of the present invention, the first entity object may be a java bean entity.
The data type may be understood as the data type of the rule stream that is initially acquired. Alternatively, the data type may be a digital subscriber line (DSL, domain Specific Language) language description type and/or a non-digital subscriber line language description type.
The object to be spliced can be understood as an object to be spliced corresponding to the rule flow.
And determining the object to be spliced corresponding to the at least one rule flow according to the data type corresponding to the at least one rule flow, and further refining. On the basis of the foregoing embodiment, optionally, the determining, according to the data type corresponding to the at least one rule flow, the object to be spliced corresponding to the at least one rule flow includes:
Converting at least one rule stream in a first preset format into a second entity object;
When the at least one rule flow has judgment logic, generating a judgment rule logic tree based on the second entity object and taking the judgment rule logic tree as the object to be spliced;
Wherein the decision rule logic tree includes at least one root node and at least one leaf node associated with the respective root node, the root node corresponding to a relational operator, the leaf node corresponding to an operation condition or operand.
The first preset format may be json format.
The second entity object may be understood as an entity object resulting from the inverse serialization process rule flow.
S340, determining the task to be processed based on the first entity object and the object to be spliced.
Specifically, a first entity object corresponding to the streaming data to be processed and an object to be spliced corresponding to the rule stream are subjected to processing to obtain a task to be processed.
S350, broadcasting the task to be processed to all parallel threads so as to process the task to be processed based on the target parallel threads.
S360, determining standard reaching information of the target service based on the obtained processing result.
According to the data processing method, the device, the equipment and the storage medium based on the streaming processing, the streaming data to be processed is processed in a deserialization mode to obtain the first entity object, and the object to be spliced corresponding to at least one rule flow is determined according to the data type corresponding to the at least one rule flow; and determining a task to be processed based on the first entity object and the object to be spliced. According to the technical scheme of the embodiment of the invention, the situation that the data processing is interrupted because the object required by the task processing does not exist in the task to be processed is avoided, and the determined task to be processed can be processed in a manner of determining the entity object first and determining the decision rule logic tree and the object to be spliced to splice the task to be processed based on the method of the invention.
On the basis of the embodiment, when the processing result is the marketing campaign reaching the standard result, the data to be processed corresponding to the marketing campaign reaching the standard result can be stored in the first database, and the marketing campaign reaching the standard result can be stored in the second database and the fourth database; when the data to be processed is received again, searching a corresponding standard reaching result from the fourth database according to the data identifier of the data to be processed, and directly returning the corresponding standard reaching result of the data to be processed based on the fourth database under the condition that the corresponding standard reaching result exists in the fourth database; searching a corresponding standard-reaching result in the second database under the condition that the fourth database does not have the corresponding standard-reaching result, synchronizing the standard-reaching result to the fourth database under the condition that the second database has the corresponding standard-reaching result, and returning the corresponding standard-reaching result of the data to be processed based on the fourth database; and under the condition that the corresponding standard reaching result does not exist in the second database, judging the data to be processed to obtain the marketing activity standard reaching result of the data to be processed, and storing the marketing activity standard reaching result into the second database and the fourth database. . When the data to be processed is received again, if the second or fourth database has the standard reaching result, the standard reaching result can be directly returned without judging, and correspondingly, the invention provides the following embodiments: fig. 5 is a flowchart of a data processing method based on stream processing according to an embodiment of the present invention. As shown in fig. 5, the data processing method based on stream processing includes the following steps:
s410, obtaining stream data to be processed, which is generated when a target user completes the process of the target service.
S420, at least one rule flow which is pre-configured and corresponds to the target service is invoked.
S430, after splicing the streaming data to be processed and the at least one rule stream, obtaining a task to be processed, and broadcasting the task to be processed to all parallel threads so as to process the task to be processed based on a target parallel thread; wherein the target parallel thread is a thread of all parallel threads.
On the basis of the embodiment, the task to be processed is processed based on the target parallel thread, and further refinement is carried out. In the scenario where the target business is a marketing campaign compliance business,
Optionally, the processing the task to be processed based on the target parallel thread includes:
And traversing the to-be-processed flow data in the to-be-processed task through a decision rule logic tree corresponding to the at least one rule flow based on the target parallel thread when the historical standard reaching result associated with the to-be-processed task is detected, and the historical standard reaching result is a preset result, so as to obtain the marketing activity standard reaching result in the processing result.
The historical standard reaching result can be understood as the stored historical standard reaching result of the current task to be processed. Alternatively, the preset results may include up to standard and/or not up to standard.
A decision rule logic tree may be understood as a logic tree comprising decision rules corresponding to a rule flow. In the embodiment of the invention, the history standard reaching result associated with the task to be processed is not standard reaching, which can be understood as that the target user historically processes the service, and when the service is historically processed, the service is not standard reaching. Furthermore, in a situation that the preset result is not standard, specifically, the decision rule logic tree corresponding to the rule flow needs to be traversed again to determine the standard result of the final marketing activity, namely, the situation that when the task to be processed is not standard in secondary processing, part of decision rules in the decision rule logic tree traversed in the history is skipped is avoided. For example, in a scenario that the tasks to be processed need to complete the standard reaching of a preset number (for example, 6 times) in series, and the standard reaching result of the marketing activity of the target service corresponding to the current tasks to be processed is confirmed to be standard reaching, if the target user has the task which does not reach the standard in the previous 5 times, the system can traverse the condition that the previous five times of tasks reach the standard again if the target user directly completes and reaches the standard in the previous 5 times of tasks, and if the target user does not reach the standard, the standard reaching result of the marketing activity of the final marketing activity reaching the standard is not confirmed to be standard reaching.
According to the technical scheme of the embodiment of the invention, the situation that the determined marketing campaign standard reaching result has errors under the condition that the target user does not process the target service based on the standard flow is avoided, and the accuracy of the determined marketing campaign standard reaching result is ensured.
S440, based on the obtained processing result.
And S450, when the processing result is the marketing campaign standard reaching result, storing the data to be processed corresponding to the marketing campaign standard reaching result into a first database, and storing the marketing campaign standard reaching result into a second database and a fourth database.
The marketing campaign reaching the standard results can be understood as the processing results corresponding to the marketing campaign reaching the standard tasks. Alternatively, the marketing campaign compliance results may include compliance and/or non-compliance.
The data to be processed may be understood as related data corresponding to the task to be processed. Alternatively, the data to be processed may include an identification of the associated task to be processed, an identification of the target user, an identification of the marketing campaign compliance service, a marketing campaign compliance result, and the like.
The first database is understood to be a database storing details of the data to be processed. In an embodiment of the present invention, the first database may be a hbase list.
S460, when the data to be processed is received again, searching a corresponding standard reaching result from the fourth database according to the data identification of the data to be processed, and directly returning the corresponding standard reaching result of the data to be processed based on the fourth database under the condition that the corresponding standard reaching result exists in the fourth database.
Wherein the fourth database may be understood as a local database. Alternatively, the local database may be guava. In the embodiment of the invention, if the regular flow enables the caching function, a guava local memory database is used for caching the standard-reaching result of the marketing activity, specifically, when the first time of the standard-reaching result of the marketing activity of the target user is standard-reaching, the cache is directly queried guava when the standard-reaching task of the marketing activity is secondarily judged, and if the standard-reaching result of the marketing activity is not judged any more, the standard-reaching result of the marketing activity is directly determined to be standard.
In the embodiment of the invention, the corresponding standard reaching result can be understood as a standard reaching result.
And S470, searching a corresponding standard reaching result in the second database under the condition that the fourth database does not have the corresponding standard reaching result, synchronizing the standard reaching result to the fourth database under the condition that the second database has the corresponding standard reaching result, and returning the corresponding standard reaching result of the data to be processed based on the fourth database.
In an embodiment of the present invention, the second database may be hbase. Based on the technical scheme of the embodiment, the corresponding standard reaching results stored in the second database and the corresponding standard reaching results stored in the fourth database can be synchronized.
And S480, under the condition that the corresponding standard reaching result does not exist in the second database, judging the data to be processed to obtain the marketing activity standard reaching result of the data to be processed, and storing the marketing activity standard reaching result into the second database and the fourth database.
Based on the technical scheme of the embodiment of the invention, the corresponding standard reaching results of the first judged data to be processed are stored in the second database and the fourth database, and when the data to be processed is judged again, the data to be processed can be directly inquired and returned in the second database and the fourth database, so that the data processing efficiency can be obviously improved under the condition of simultaneously processing a large number of marketing activities to reach the standard.
Optionally, the data processing method based on stream processing further includes:
storing the marketing campaign achievement result in a third database, and sending a notification of data update of the third database to the downstream component;
so that the downstream component receives the notification, updates and displays the up-to-standard state of the data to be processed based on the third database.
In an embodiment of the present invention, the third database may be kafka. Specifically, when the marketing campaign standard reaching result is stored in the third database, a notification representing the data update of the third database is generated, the notification is sent to the downstream component, and under the condition that the downstream component receives the notification, the standard reaching state of the data to be processed is updated in real time based on the updated data in the third database and is displayed to the user.
Based on the technical scheme of the embodiment, the marketing campaign standard reaching result is stored in the third database, and the notification of the data update of the third database is sent to the downstream component, so that the standard reaching state of the data to be processed can be updated in real time and displayed to the user in real time.
Alternatively, the target compliance information may include compliance and/or non-compliance.
On the basis of the above embodiment, the method further includes: and determining standard reaching judgment information based on the target standard reaching information and storing the standard reaching judgment information.
The standard-reaching judgment information can be standard-reaching information cached in hbase and kakfa. Alternatively, the compliance determination information corresponding to each marketing campaign compliance service may include a plurality of pieces. Each piece of qualifying information may include an identification of a target user, an identification of a marketing campaign qualifying service, a marketing campaign qualifying result, and the like. Each piece of standard reaching judgment information characterizes whether the standard reaching business of the marketing activity completed by the target user reaches the standard or not.
Based on the technical scheme of the embodiment, the query of the user and the manager on the required standard reaching information can be supported based on the stored standard reaching judging information.
In the embodiment of the invention, the standard reaching judgment information can be stored to hbase and kakfa respectively and successively. The standard reaching judgment information is cached to hbase, and after the target standard reaching information is cached to hbase, the target standard reaching information is written to kakfa.
In the embodiment of the invention, the information in hbase can support user inquiry, and the data in kakfa can support data calling in the data processing process. Based on the technical scheme of the embodiment, the situation that the data is processed but the user cannot inquire can be avoided.
In order to better return the processing progress of the marketing campaign to the user, the invention discloses a progress display method based on the embodiment. Optionally, the data processing method further includes:
And generating a marketing campaign progress bar for display based on the at least one marketing campaign compliance result when the target compliance information comprises the at least one marketing campaign compliance result.
The embodiment of the invention realizes the staged progress display of the standard reaching result of the marketing campaign under the condition that the target standard reaching information comprises at least one standard reaching result of the marketing campaign.
According to the data processing method, the device, the equipment and the storage medium based on the stream processing, when the processing result is the marketing campaign standard reaching result, the marketing campaign standard reaching result and the data to be processed are correspondingly stored in the first database, so that when the data to be processed is received again, the corresponding marketing campaign standard reaching result is called from the first database according to the data identification of the data to be processed; and determining target standard information corresponding to the standard-reaching business of the marketing campaign according to the data to be processed and the corresponding standard-reaching result of the marketing campaign. The invention aims to solve the technical problem that the efficiency is lower when a large amount of data of the marketing campaign standard service are processed based on the streaming processing, and the invention achieves the effects of improving the efficiency of processing the data of the marketing campaign standard service based on the streaming processing and improving the user experience.
Fig. 6 is an overall flowchart of a data processing method based on stream processing according to an embodiment of the present invention.
As shown in fig. 6, the overall flow of the data processing method based on the stream processing is as follows:
A kafka data stream (i.e., streaming data to be processed) is acquired. The kafka data stream may include event data generated by a business system, and may be in json format.
A kafka rule stream (i.e., rule stream) is obtained. The kafka rule flow may include a marketing campaign decision rule corresponding to a marketing campaign decision service, and the kafka rule flow may be a json decision rule described in DSL, where the logic may be constructed by a service person through a service management platform, or may be constructed by a professional developer writing a script.
And (5) performing deserialization processing. The kafka data stream in json format is constructed as a java bean entity, and if the decision rule is described by DSL, a decision rule logic tree is constructed, wherein the tree comprises a plurality of atomic logics and relations among the atomic logics, and each atomic logic comprises a plurality of conditions. The decision rule logic tree is a binary tree, the tree is formed after SQL analysis, the root node is a relational operator, the child node is an operation condition or an operand, and when the standard is decided, the final marketing activity standard result can be determined by traversing the whole decision rule logic tree. The atomic logic of rule determination may include the following cases: determining the data to be processed of the stream: and judging whether the data to be processed in the bar stream reaches the specified standard or not. Summarizing real-time status patterns: and the real-time accumulated record number is used for counting data six in real time. Fixed cycle mode: standard reaching judgment within a specific time range is required for service occurrence time. Active period mode: and judging whether the service occurrence time meets the standard in a time range designated by the service.
And performing connection on the data stream and the rule stream to obtain a task to be processed. In the embodiment of the present invention, flink includes a plurality of parallel threads, which can perform parallel processing on a task to be processed, and specifically, can broadcast and send the task to be processed obtained by performing splicing processing on a data stream and a rule stream to each parallel thread.
AlertProcessFunction (data processing). In the embodiment of the present invention, one custom process function in flink is the core logical encapsulation point of the real-time stream processing engine. Specifically, firstly, the rule flow is received in processBroadcastElement functions of the function and stored in BroadcastState, meanwhile, the expired rule flow is cleaned regularly, the content of the state is snapped to hdfs regularly, and the state content can be recovered when the operation fails or the task does not reach the standard; then, the data flow passing through the service is received and processed through processElement function, in processElement function, all rule flow loop iterations stored in BroadcastState are acquired through context to judge, if the service date of the data is not in the rule flow active period, the judgment is skipped, if the rule flow starts the caching function, guava local memory database is used for caching the up-to-standard result, and after the first judgment of the user, the second judgment is directly performed for cache searching; if the result reaches the standard, the judgment is not performed, and the standard is directly determined, so that the data processing efficiency is improved, and particularly, the effect is very obvious in a huge data processing task.
The marketing campaign reaching the standard results are written into hbase and then written into kafka. The purpose is to ensure that the piece of data is queried already in hbase when the real-time stream processing engine processes the data in kakfa. For example, when transmitting to some streaming data of kafka, all the historical streaming data which is associated with the streaming data and meets the conditions of the user in the current month range of hbase are simultaneously queried, and then the result set is subjected to a count operation in the node memory.
In the embodiment of the invention, the data processing supports counting (SUM) of data flows, AVG (average), MAX (maximum), MIN (minimum), continue_count (number of days for determining that the user continuously logs in to the target platform), no_continue_count (total number of days for determining not continuously logging in to the target platform), and date_min (minimum login time). The condition for querying hbase is hbase filter generated from a combination of conditions of atomic logic when marketing campaign DSL rules generate decision rule logic trees.
In the embodiment of the invention, the Function < OUT > of the externally-opened custom Function interface is conveniently expanded, the interface is automatically loaded through Java SPI, and a developer can inherit the interface to realize the custom Function.
In the embodiment of the invention, the progress of the task to be processed can be supported to reach the standard, for example, a target user is required to complete 6 times of task reaching the standard, the first time of display (1/6) is completed, and the second time of display (2/6) is completed until 6 times of task reaching the standard are finally met. The implementation mode is as follows: and judging the stream data each time, simultaneously giving out the times which are up to standard within the range from the starting date of the marketing activity judging service to the current time, and outputting the result by combining the mark of whether up to standard.
In the embodiment of the invention, aiming at the business comprising multiple processing links, a rule flow corresponding to each link can be respectively established, specifically, firstly, the task of the first link is used for judging whether the standard is up to standard, and a standard mark is returned to obtain the history data of the first link, and the task state is modified based on the history data and the mark, and is never completed and is changed into in progress; if the second link is up to standard, the receiver is notified to change from the in-progress state to the completion state, and if the target user cancel operation is encountered, the receiver is notified to change the in-progress state to the incompletion state.
In the embodiment of the invention, the values of count and sum are automatically calculated and returned simultaneously when the aggregation calculation is carried out under the default condition, and the standard result is output to a plurality of bins for report statistics.
In the embodiment of the invention, the task to be processed can carry out multi-task simultaneous judgment aiming at a data stream with the same schema, and the judgment flow is to traverse all rule logic in a judgment rule logic tree to carry out standard judgment when receiving a service to be processed.
In the embodiment of the invention, aiming at different business rules, seamless judgment can be realized by only modifying the field name and the type of the configuration file without modifying any Java code, namely, only adjusting the configuration file and configuring different schemes.
Based on the technical scheme of the embodiment of the invention, the following effects can be achieved: aiming at the marketing campaign standard service, a script program does not need to be written according to each conventional job, and the invention integrates a class of tasks which use the same data set to share one job. The method integrates rich custom functions, provides an open expansion interface, and enables a developer to realize self-defined custom functions. And the service rules can be dynamically added and modified without restarting operation. Support for ongoing traffic status presentation. And supporting the judgment that the standard can be reached only by multiple task processing. Based on Guava, input and output to a database can be reduced, and data processing efficiency is improved. Support to make up-to-standard decisions for target users of a particular group of guests. And outputting and storing the standard reaching result of the standard reaching business of the marketing activity.
Fig. 7 is a schematic structural diagram of a data processing device based on stream processing according to an embodiment of the present invention. As shown in fig. 7, the apparatus includes: a pending data acquisition module 510, a rule flow acquisition module 520, a data processing module 530, and a compliance information determination module 540.
The to-be-processed data obtaining module 510 is configured to obtain to-be-processed streaming data generated when the target user completes the process of the target service; a rule flow obtaining module 520, configured to invoke at least one rule flow configured in advance and corresponding to the target service; the data processing module 530 is configured to obtain a task to be processed after the streaming data to be processed and the at least one rule stream are spliced, and broadcast the task to be processed to all parallel threads, so that the task to be processed is processed based on a target parallel thread; the target parallel threads are threads in all parallel threads; and the standard reaching information determining module 540 is configured to determine standard reaching information of the target service based on the obtained processing result.
The data processing method, the device, the equipment and the storage medium based on the stream processing provided by the embodiment acquire stream data to be processed, which is generated when a target user completes the process of a target service; invoking at least one rule flow which is preset and corresponds to the target service; the to-be-processed streaming data and the at least one rule stream are spliced to obtain to-be-processed tasks, and the to-be-processed tasks are broadcasted to all parallel threads so as to process the to-be-processed tasks based on target parallel threads; the target parallel threads are threads in all parallel threads; according to the technical scheme of the embodiment of the invention, the effect of improving the efficiency of data processing on a large amount of streaming data to be processed generated by a target user when the process of the target service is completed is achieved by determining the task to be processed corresponding to the streaming data to be processed, which is generated by the target user when the process of the target service is completed, broadcasting the task to all parallel threads and processing the task to be processed based on the determined target and threads.
On the basis of the device, the device can also comprise an original data acquisition module and a to-be-processed data determination module.
The original data acquisition module is used for acquiring original streaming data corresponding to at least one service to be processed from a data lake based on at least one topic subscribed in advance before streaming data to be processed generated by the target user in the process of completing the target service; wherein the target service is a service in the service to be processed;
The to-be-processed data determining module is used for acquiring the original streaming data as the to-be-processed streaming data when the fact that the preset condition is met is detected.
On the basis of the device, optionally, the service to be processed is a service for completing a specific event, and the original streaming data is event data generated based on the operation behavior of the service to be processed.
Based on the above apparatus, optionally, the data processing module 530 includes: and the deserializing unit and the task to be processed determining unit.
The reverse serialization unit is configured to perform a reverse serialization process on the to-be-processed streaming data to obtain a first entity object, and determine, according to a data type corresponding to the at least one rule stream, an object to be spliced corresponding to the at least one rule stream;
The task to be processed determining unit is configured to determine the task to be processed based on the first entity object and the object to be spliced.
On the basis of the device, optionally, the reverse serialization unit is specifically configured to:
Converting at least one rule stream in a first preset format into a second entity object;
When the at least one rule flow has judgment logic, generating a judgment rule logic tree based on the second entity object and taking the judgment rule logic tree as the object to be spliced;
Wherein the decision rule logic tree includes at least one root node and at least one leaf node associated with a respective root node, the root node corresponding to a relational operator, the leaf node corresponding to an operation condition or operand.
Optionally, on the basis of the above device, the method further includes: the rule flow cleaning module and the snapshot storage module.
The rule flow cleaning module is used for cleaning a preset expired rule flow corresponding to the service to be selected based on the deployed objective function, and storing the cleaned expired rule flow into the cache database;
and the snapshot storage module is used for forming a data snapshot after the expired rule stream is cleaned, and storing the data snapshot to a distributed file system.
Optionally, on the basis of the above device, the method further includes: and the operation processing module and the business rule determining module.
The operation processing module is used for determining existing services and corresponding service rules corresponding to the new services and/or the service update rules when detecting the operation of the new services and/or changing the service update rules corresponding to the service to be processed so as to determine the service rules to be used;
The business rule determining module is configured to determine, based on the business rule to be used, a business rule corresponding to the newly added business and/or the business update rule.
Based on the above apparatus, optionally, the data processing module 530 is specifically configured to:
And traversing the to-be-processed flow data in the to-be-processed task through a decision rule logic tree corresponding to the at least one rule flow based on the target parallel thread when the historical standard reaching result associated with the to-be-processed task is detected, and the historical standard reaching result is a preset result, so as to obtain the marketing activity standard reaching result in the processing result.
On the basis of the device, optionally, the at least one rule flow is generated based on configuration rules under at least one rule logic, and the at least one rule logic is matched with preset activity information of the marketing activity up-to-standard service.
On the basis of the device, optionally, the target service is a marketing campaign standard reaching service, and the method further comprises: the system comprises a standard result storage module, a fourth database searching module, a second database searching module and a data judging module to be processed.
The marketing activity standard reaching result storage module is used for storing the data to be processed corresponding to the marketing activity standard reaching result into a first database and storing the marketing activity standard reaching result into a second database and a fourth database when the processing result is the marketing activity standard reaching result;
The fourth database searching module is used for searching corresponding standard reaching results from the fourth database according to the data identification of the data to be processed when the data to be processed is received again, and directly returning the corresponding standard reaching results of the data to be processed based on the fourth database under the condition that the corresponding standard reaching results exist in the fourth database;
The second database searching module is used for searching corresponding standard reaching results in the second database under the condition that the fourth database does not have the corresponding standard reaching results, synchronizing the standard reaching results to the fourth database under the condition that the second database has the corresponding standard reaching results, and returning the corresponding standard reaching results of the data to be processed based on the fourth database;
And the to-be-processed data judging module is used for judging the to-be-processed data under the condition that the corresponding standard reaching result does not exist in the second database, obtaining the marketing activity standard reaching result of the to-be-processed data, and storing the marketing activity standard reaching result into the second database and the fourth database.
Optionally, on the basis of the above device, the method further includes: a third database storage module and a downstream notification module; wherein,
The third database storage module is used for storing the marketing campaign achievement to a third database and sending a notification of data update of the third database to a downstream component;
And the downstream notification module is used for enabling the downstream component to receive the notification, updating and displaying the standard reaching state of the data to be processed based on the third database.
Optionally, on the basis of the above device, the method further includes: and the progress display module is used for generating a marketing campaign progress bar for display based on the at least one marketing campaign standard reaching result when the target standard reaching information comprises the at least one marketing campaign standard reaching result.
The data processing device based on the stream processing provided by the embodiment of the invention can execute the data processing method based on the stream processing provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.
Fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present invention. Fig. 8 shows a block diagram of an exemplary electronic device 50 suitable for use in implementing the embodiments of the present invention. The electronic device 50 shown in fig. 8 is merely an example and should not be construed as limiting the functionality and scope of use of embodiments of the present invention.
As shown in fig. 8, the electronic device 50 is in the form of a general purpose computing device. Components of electronic device 50 may include, but are not limited to: one or more processors or processing units 501, a system memory 502, and a bus 503 that connects the various system components (including the system memory 502 and processing units 501).
Bus 503 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, a processor, or a local bus using any of a variety of bus architectures. By way of example, and not limitation, such architectures include Industry Standard Architecture (ISA) bus, micro channel architecture (MAC) bus, enhanced ISA bus, video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.
Electronic device 50 typically includes a variety of computer system readable media. Such media can be any available media that is accessible by electronic device 50 and includes both volatile and nonvolatile media, removable and non-removable media.
The system memory 502 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM) 504 and/or cache memory 505. Electronic device 50 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 506 may be used to read from or write to non-removable, nonvolatile magnetic media (not shown in FIG. 8, commonly referred to as a "hard disk drive"). Although not shown in fig. 8, a magnetic disk drive for reading from and writing to a removable non-volatile magnetic disk (e.g., a "floppy disk"), and an optical disk drive for reading from or writing to a removable non-volatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In such cases, each drive may be coupled to bus 503 through one or more data medium interfaces. Memory 502 may include at least one program product having a set (e.g., at least one) of program modules configured to carry out the functions of embodiments of the invention.
A program/utility 508 having a set (at least one) of program modules 507 may be stored, for example, in memory 502, such program modules 507 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment. Program modules 507 typically perform the functions and/or methods of the described embodiments of the invention.
The electronic device 50 may also communicate with one or more external devices 509 (e.g., keyboard, pointing device, display 510, etc.), one or more devices that enable a user to interact with the electronic device 50, and/or any device (e.g., network card, modem, etc.) that enables the electronic device 50 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 511. Also, the electronic device 50 may communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet, through a network adapter 512. As shown, the network adapter 512 communicates with other modules of the electronic device 50 over the bus 503. It should be appreciated that although not shown in fig. 8, other hardware and/or software modules may be used in connection with electronic device 50, including, but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.
The processing unit 501 executes various functional applications and data processing based on streaming processing by running a program stored in the system memory 502, for example, implementing the data processing method based on streaming processing provided by the embodiment of the present invention.
The embodiments of the present invention also provide a storage medium containing computer-executable instructions, which when executed by a computer processor, are for performing a data processing method based on streaming, the method comprising:
Obtaining at least one streaming data in a target message system, wherein the streaming data comprises product basic data of at least one product to be selected under at least one basic field and floating data under a floating field with changeable data;
For the streaming data, determining a target product corresponding to the streaming data, and reading a target configuration rule corresponding to the target product from a configuration center;
reading a field to be processed in the target configuration rule based on an objective function, and acquiring target data content in the streaming data according to the field to be processed;
And determining a target processing result of the streaming data according to the target data content and the target configuration rule.
The computer storage media of embodiments of the invention may take the form of any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for embodiments of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).
Embodiments of the present invention also provide a computer program product comprising a computer program which, when executed by a processor, implements a data processing method based on streaming as provided by any of the embodiments of the present invention.
Computer program product in the implementation, the computer program code for carrying out operations of the present invention may be written in one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).
Note that the above is only a preferred embodiment of the present invention and the technical principle applied. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, while the invention has been described in connection with the above embodiments, the invention is not limited to the embodiments, but may be embodied in many other equivalent forms without departing from the spirit or scope of the invention, which is set forth in the following claims.

Claims (16)

1. A data processing method based on stream processing, comprising:
Obtaining stream data to be processed generated when a target user completes the process of a target service;
Invoking at least one rule flow which is preset and corresponds to the target service;
The to-be-processed streaming data and the at least one rule stream are spliced to obtain to-be-processed tasks, and the to-be-processed tasks are broadcasted to all parallel threads so as to process the to-be-processed tasks based on target parallel threads; the target parallel threads are threads in all parallel threads;
and determining the standard reaching information of the target service based on the obtained processing result.
2. The method of claim 1, wherein prior to the obtaining the streaming data to be processed generated by the target user in completing the target service, the method further comprises:
acquiring original streaming data corresponding to at least one service to be processed from a data lake based on at least one pre-subscribed theme; wherein the target service is a service in the service to be processed;
and when the condition that the preset condition is met is detected, acquiring the original streaming data as the streaming data to be processed.
3. The method of claim 2, wherein the service to be processed is a service completing a specific event, and the raw streaming data is event data generated based on an operation behavior of the service to be processed.
4. The method according to claim 1, wherein the obtaining the task to be processed by splicing the streaming data to be processed and the at least one rule stream includes:
Performing deserialization processing on the streaming data to be processed to obtain a first entity object, and determining an object to be spliced corresponding to the at least one rule stream according to the data type corresponding to the at least one rule stream;
And determining the task to be processed based on the first entity object and the object to be spliced.
5. The method according to claim 4, wherein determining the object to be spliced corresponding to the at least one rule stream according to the data type corresponding to the at least one rule stream comprises:
Converting at least one rule stream in a first preset format into a second entity object;
When the at least one rule flow has judgment logic, generating a judgment rule logic tree based on the second entity object and taking the judgment rule logic tree as the object to be spliced;
Wherein the decision rule logic tree includes at least one root node and at least one leaf node associated with a respective root node, the root node corresponding to a relational operator, the leaf node corresponding to an operation condition or operand.
6. The method according to claim 1, wherein the method further comprises:
Cleaning a preset expired rule flow corresponding to a service to be selected based on a deployed objective function, and storing the cleaned expired rule flow into a permanent cache database;
And cleaning the expired rule stream to form a data snapshot, and storing the data snapshot to a distributed file system.
7. The method according to claim 1, wherein the method further comprises:
When detecting the operation of the new service and/or changing the service updating rule corresponding to the service to be processed, determining the existing service and the corresponding service rule corresponding to the new service and/or the service updating rule so as to determine the service rule to be used;
And determining the service rule corresponding to the newly added service and/or the service updating rule based on the service rule to be used.
8. The method of claim 1, wherein the target business is a marketing campaign compliance business, the processing the task to be processed based on a target parallel thread comprises:
And traversing the to-be-processed flow data in the to-be-processed task through a decision rule logic tree corresponding to the at least one rule flow based on the target parallel thread when the historical standard reaching result associated with the to-be-processed task is detected, and the historical standard reaching result is a preset result, so as to obtain the marketing activity standard reaching result in the processing result.
9. The method of claim 1 or 8, wherein the at least one rule stream is generated based on configuration rules under at least one rule logic that matches activity information of a pre-defined marketing campaign compliance service.
10. The method of claim 1, wherein the target business is a marketing campaign compliance business, the method further comprising:
When the processing result is a marketing campaign standard reaching result, storing the data to be processed corresponding to the marketing campaign standard reaching result into a first database, and storing the marketing campaign standard reaching result into a second database and a fourth database;
When the data to be processed is received again, searching a corresponding standard reaching result from the fourth database according to the data identifier of the data to be processed, and directly returning the corresponding standard reaching result of the data to be processed based on the fourth database under the condition that the corresponding standard reaching result exists in the fourth database;
searching a corresponding standard-reaching result in the second database under the condition that the fourth database does not have the corresponding standard-reaching result, synchronizing the standard-reaching result to the fourth database under the condition that the second database has the corresponding standard-reaching result, and returning the corresponding standard-reaching result of the data to be processed based on the fourth database;
And under the condition that the corresponding standard reaching result does not exist in the second database, judging the data to be processed to obtain the marketing activity standard reaching result of the data to be processed, and storing the marketing activity standard reaching result into the second database and the fourth database.
11. The method according to claim 10, wherein the method further comprises:
Storing the marketing campaign achievement result to a third database, and sending a notification of data update of the third database to a downstream component;
And enabling the downstream component to receive the notification, updating and displaying the standard reaching state of the data to be processed based on the third database.
12. The method according to claim 10, wherein the method further comprises:
And generating a marketing campaign progress bar for display based on the at least one marketing campaign compliance result when the target compliance information comprises the at least one marketing campaign compliance result.
13. A data processing apparatus based on stream processing, comprising:
the to-be-processed data acquisition module is used for acquiring to-be-processed streaming data generated when a target user completes the process of a target service;
the rule flow acquisition module is used for calling at least one rule flow which is preset and corresponds to the target service;
The data processing module is used for obtaining a task to be processed after splicing the streaming data to be processed and the at least one rule stream, and broadcasting the task to be processed to all parallel threads so as to process the task to be processed based on a target parallel thread; the target parallel threads are threads in all parallel threads;
and the standard reaching information determining module is used for determining standard reaching information of the target service based on the obtained processing result.
14. An electronic device, comprising: a processor, and a memory communicatively coupled to the processor;
The memory stores computer-executable instructions;
The processor executes computer-executable instructions stored in the memory to implement the stream processing-based data processing method as recited in any one of claims 1-12.
15. A computer readable storage medium, wherein computer executable instructions are stored in the computer readable storage medium, which when executed by a processor are adapted to implement a data processing method based on stream processing as claimed in any one of claims 1-12.
16. A computer program product comprising a computer program which, when executed by a processor, implements the stream processing based data processing method of any one of claims 1 to 12.
CN202410435679.0A 2024-04-11 2024-04-11 Data processing method, device, equipment and storage medium based on stream processing Pending CN118260351A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410435679.0A CN118260351A (en) 2024-04-11 2024-04-11 Data processing method, device, equipment and storage medium based on stream processing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410435679.0A CN118260351A (en) 2024-04-11 2024-04-11 Data processing method, device, equipment and storage medium based on stream processing

Publications (1)

Publication Number Publication Date
CN118260351A true CN118260351A (en) 2024-06-28

Family

ID=91610803

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410435679.0A Pending CN118260351A (en) 2024-04-11 2024-04-11 Data processing method, device, equipment and storage medium based on stream processing

Country Status (1)

Country Link
CN (1) CN118260351A (en)

Similar Documents

Publication Publication Date Title
CN107506451B (en) Abnormal information monitoring method and device for data interaction
US20200004854A1 (en) Sending notifications in a multi-client database environment
CN107908637B (en) Entity updating method and system based on knowledge base
US20170206249A1 (en) Systems and methods for implementing urban voices
CN109190025B (en) Information monitoring method, device, system and computer readable storage medium
CN105512910A (en) Target user screening method and apparatus
CN113988954A (en) Financing product marketing method and device
CN112181678A (en) Service data processing method, device and system, storage medium and electronic device
CN111917814B (en) Data publishing method, data subscribing method, data publishing device, data subscribing system and readable storage medium
CN113760242B (en) Data processing method, device, server and medium
CN109743248B (en) Content distribution method, device, terminal, server and storage medium
CN107480189A (en) A kind of various dimensions real-time analyzer and method
CN111381976B (en) Method and device for updating message prompt data, storage medium and computer equipment
CN118260351A (en) Data processing method, device, equipment and storage medium based on stream processing
CN112148762A (en) Statistical method and device for real-time data stream
CN114723397A (en) Flow execution method and device
CN114756301A (en) Log processing method, device and system
CN114841648B (en) Material distribution method, device, electronic equipment and medium
US20240193012A1 (en) Correlation and policy engine system and method of operation
CN116915870B (en) Task creation request processing method, device, electronic equipment and readable medium
CN111625524B (en) Data processing method, device, equipment and storage medium
CN117009632A (en) Data pulling method, device, computer equipment, storage medium and program product
CN115809083A (en) Automatic injection business alarming method and device, electronic equipment and storage medium
US10489219B2 (en) Transitioning between data stream processors in a publish-subscribe system
CN117390112A (en) Data synchronization method, device, storage medium and computer equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination