CN114580536A - Data stream processing method, system, device and readable storage medium - Google Patents

Data stream processing method, system, device and readable storage medium Download PDF

Info

Publication number
CN114580536A
CN114580536A CN202210210419.4A CN202210210419A CN114580536A CN 114580536 A CN114580536 A CN 114580536A CN 202210210419 A CN202210210419 A CN 202210210419A CN 114580536 A CN114580536 A CN 114580536A
Authority
CN
China
Prior art keywords
data
service
pipeline
corrected
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210210419.4A
Other languages
Chinese (zh)
Inventor
孙畅
吴谦
陈亮亮
吴康子
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Youzan Technology Co ltd
Original Assignee
Hangzhou Youzan Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Youzan Technology Co ltd filed Critical Hangzhou Youzan Technology Co ltd
Priority to CN202210210419.4A priority Critical patent/CN114580536A/en
Publication of CN114580536A publication Critical patent/CN114580536A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/251Fusion techniques of input or preprocessed data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24317Piecewise classification, i.e. whereby each classification requires several discriminant rules

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a method, a system and a device for processing data streams, wherein the method comprises the following steps: acquiring initial file information and analyzing to generate initial service data; correcting based on preset configuration information to obtain corrected service mode data; performing pipeline construction processing based on the standardized historical sample to obtain a multi-level pipeline model based on the corrected service mode data; acquiring a data stream processing task request to obtain a first user service identity; performing identification matching based on a multi-level pipeline model to obtain an optimal service pipeline; and executing the service customization flow processing based on the optimal service pipeline to obtain a uniform data model. The invention can train to obtain a uniform data model, and after obtaining the uniform data model, merchants can carry out communication and fusion processing on data of different channels. On the basis of data communication, closed-loop analysis of data can be carried out, various data of merchants can be communicated, and all-round analysis can be carried out subsequently.

Description

Data stream processing method, system, device and readable storage medium
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a method, a system, an apparatus, and a readable storage medium for processing a data stream.
Background
In recent years, with the rise of SaaS, more and more companies select SaaS services, so that the amount of SaaS service providers and business is increasing, and large-scale data inflow and outflow is becoming a normal state. In the e-market scenario, a merchant may need to import and export a large amount of relevant data or import customer information, such as basic information, points, grades, membership cards, and the like; in addition, a lot of merchants have more and more off-line data and need to communicate with a lot of channels, which inevitably results in different industry information being produced in different channels, at present, the business mode has a lot of channels and industries, so the data volume is more and more complex than that of a single mode, and all merchants have the requirement of synchronizing data across channels, so the multi-channel data volume processed in a short time is extremely large.
In the prior art, a large amount of development work is involved in getting through multi-channel data, so most of the data only focuses on rule matching and flow series connection, and a scheme for large-scale concurrent speed improvement and rapid adaptation to various fields and scenes is not provided. In actual operation, customized import cannot be performed according to a specified channel and a specific file format, and a large amount of manpower is needed for maintenance and development subsequently, once the import flow has problems, disaster recovery is difficult, data flow into a post cannot be analyzed automatically, and data closed loop is not considered.
In summary, the following problems occur:
firstly, the differences of different required file formats, data formats and the like cannot be rapidly adapted to different channel scenes through interfaces and configuration, and cannot be dynamically effective in real time through parameter modification;
secondly, the data flow processing is single and fixed, and the business process arrangement cannot be realized for some special scenes of the business process.
Thirdly, in the distributed environment, due to network jitter and other reasons, task interruption and failure conditions are caused, each business party needs to consider task reliability independently, and the cost of business development is increased;
and fourthly, data closed-loop processing such as data analysis, report forms, suggestions and the like is lacked.
In short, the existing scheme only performs a focusing process, needs a large amount of code development work, cannot meet the requirement that the SaaS industry quickly supports the customized adaptation industry, and also has the technical problems of disaster tolerance, recovery, automatic data embedding and core data closed loop under the condition of large data volume.
Disclosure of Invention
The invention provides a data stream processing method, a system, a device and a readable storage medium aiming at the defects in the prior art.
In order to solve the technical problem, the invention is solved by the following technical scheme:
a data stream processing method, comprising the steps of:
acquiring initial file information containing any external data, analyzing the initial file information and generating initial service data;
correcting the initial service data based on preset configuration information to obtain corrected service mode data;
performing pipeline construction processing on the corrected service mode data based on standardized historical samples in a standardized historical sample library to obtain a multi-level pipeline model based on the corrected service mode data, wherein the multi-level pipeline model comprises a plurality of service pipelines;
acquiring a data stream processing task request, and performing initialization processing on the data stream processing task request by combining preset configuration information to obtain a first user service identity;
based on the multi-level pipeline model, identifying and matching the first user service identity to obtain an optimal service pipeline corresponding to the first user service identity;
and executing service customization flow processing based on the optimal service pipeline to obtain a uniform data model.
As an implementation manner, the acquiring and analyzing initial file information including any external data to generate initial service data includes the following steps:
acquiring and analyzing initial file information to obtain service characteristics of an original file;
identifying the initial file information based on the service characteristics to obtain file data, wherein the file data comprises channel data, service scene data and industry information data;
and arranging and combining the file data to obtain initial service data.
As an implementation manner, the method for correcting the initial service data based on the preset configuration information to obtain corrected service mode data includes the following steps:
acquiring preset configuration information, wherein the preset configuration information at least comprises channel information, service scene information and industry information;
and correcting the initial service data based on the preset configuration information to obtain corrected service mode data.
As an implementable manner, the pipeline construction processing is performed on the corrected service pattern data based on the standardized historical samples in the standardized historical sample library to obtain a multi-level pipeline model based on the corrected service pattern data, and the method includes the following steps:
classifying the standardized historical sample data in the standardized historical samples to obtain a multi-level service pipeline sample containing a plurality of user service identities;
and performing pipeline construction processing on the corrected service mode data based on the multi-level service pipeline sample to obtain a multi-level pipeline model comprising a plurality of service pipelines, wherein the plurality of user service identities and the plurality of service pipelines have one-to-one correspondence.
As an implementable manner, the pipeline construction processing is performed on the corrected service pattern data based on the standardized historical samples in the standardized historical sample library to obtain a multi-level pipeline model based on the corrected service pattern data, and the method includes the following steps:
acquiring standardized historical samples in a standardized historical sample library, and extracting standardized historical sample information, wherein the standardized historical sample information comprises a plurality of user service identities;
classifying the standardized historical sample information by combining preset configuration information to obtain a multi-level service pipeline sample containing the plurality of user service identities;
carrying out multi-level classification processing on the corrected service mode data through preset configuration information to obtain multi-level corrected service mode data;
and constructing the multi-level correction service mode data step by step through the multi-level service pipeline sample to obtain a multi-level pipeline model comprising a plurality of service pipelines.
As an implementable embodiment, the acquiring a data stream processing task request and performing initialization processing on the data processing task request in combination with preset configuration information to obtain a first user service identity specifically includes:
loading file information contained in the data stream processing task request based on the preset configuration information, and initializing the file information according to the preset configuration information to obtain first initialized file information;
and analyzing the first initialization file information step by step to obtain a first user service identity.
As an implementation manner, the identifying and matching the first user service identity based on the multi-tier service pipeline model to obtain an optimal service pipeline corresponding to the first user service identity includes the following steps:
and matching the multi-level pipeline model based on the first user service identity, and selecting a service pipeline corresponding to the user service identity with the highest matching degree value, wherein the service pipeline corresponding to the user service identity with the highest matching degree value is an optimal service pipeline corresponding to the first user service identity.
As an implementation manner, the business customization flow process includes the following steps:
analyzing the optimal service pipeline, and loading field data corresponding to the project according to the configured field mapping rule to obtain analyzed data;
classifying the analyzed data according to structured index data and unstructured data, and respectively storing the classified data in a Mysql database and an HBase database;
scanning the data, checking the type and the service, and after the check is passed, performing aggregation and recombination on the related associated data to obtain recombined data;
and converting the recombined data to obtain a uniform data model.
As an implementation manner, the method further comprises the following steps of polling the business customization flow process:
detecting whether the service customizing flow processing process is finished or not based on preset reasonable scheduling time delay;
if yes, re-calling to continue executing the service customization flow processing process;
if the process of executing the business customizing flow fails, automatically recording the fault position, establishing a special identifier, and then performing the fault transfer recovery process.
A data flow processing system comprises an acquisition and analysis module, a data correction module, a model construction module, an initialization analysis module, an identification and matching module and a customized flow processing module;
the acquisition and analysis module is used for acquiring and analyzing initial file information containing any external data to generate initial service data;
the data correction module is used for correcting the initial service data based on preset configuration information to obtain corrected service mode data;
the model construction module is used for carrying out pipeline construction processing on the corrected service mode data based on standardized historical samples in a standardized historical sample library to obtain a multi-level pipeline model based on the corrected service mode data, wherein the multi-level pipeline model comprises a plurality of service pipelines;
the initialization analysis module is used for acquiring a data stream processing task request, and performing initialization processing on the data stream processing task request by combining preset configuration information to obtain a first user service identity;
the identification matching module is used for carrying out identification matching on the first user service identity based on the multi-level pipeline model to obtain an optimal service pipeline corresponding to the first user service identity;
and the customization flow processing module executes service customization flow processing based on the optimal service pipeline to obtain a uniform data model.
A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method steps of:
acquiring initial file information containing any external data, analyzing the initial file information and generating initial service data;
correcting the initial service data based on preset configuration information to obtain corrected service mode data;
performing pipeline construction processing on the corrected service mode data based on standardized historical samples in a standardized historical sample library to obtain a multi-level pipeline model based on the corrected service mode data, wherein the multi-level pipeline model comprises a plurality of service pipelines;
acquiring a data stream processing task request, and performing initialization processing on the data stream processing task request by combining preset configuration information to obtain a first user service identity;
based on the multi-level pipeline model, identifying and matching the first user service identity to obtain an optimal service pipeline corresponding to the first user service identity;
and executing service customization flow processing based on the optimal service pipeline to obtain a uniform data model.
An electronic device comprising a memory, a processor and a computer program stored in the memory and running on the processor, the processor implementing the method steps when executing the computer program as follows:
acquiring initial file information containing any external data, analyzing the initial file information and generating initial service data;
correcting the initial service data based on preset configuration information to obtain corrected service mode data;
performing pipeline construction processing on the corrected service mode data based on standardized historical samples in a standardized historical sample library to obtain a multi-level pipeline model based on the corrected service mode data, wherein the multi-level pipeline model comprises a plurality of service pipelines;
acquiring a data stream processing task request, and performing initialization processing on the data stream processing task request by combining preset configuration information to obtain a first user service identity;
based on the multi-level pipeline model, identifying and matching the first user service identity to obtain an optimal service pipeline corresponding to the first user service identity;
and executing service customization flow processing based on the optimal service pipeline to obtain a uniform data model.
Due to the adoption of the technical scheme, the invention has the remarkable technical effects that:
by the method, the system and the device, the unified data model can be obtained through training, and after the unified data model is obtained, merchants can communicate and fuse data of different channels. On the basis of data communication, closed-loop analysis of data can be carried out, various data of merchants can be communicated, all-around analysis can be carried out subsequently, more favorable operation guidance opinions, various marketing reports and the like are provided for the merchants, and the merchants can manage the data more conveniently.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a schematic overall flow diagram of the process of the present invention;
FIG. 2 is a schematic flow chart of generating initial traffic data;
FIG. 3 is a schematic flow chart of obtaining corrected traffic pattern data;
FIG. 4 is a schematic flow diagram of one embodiment of a multi-level pipeline model;
FIG. 5 is a schematic flow diagram of another embodiment of a multi-level pipeline model;
FIG. 6 is a schematic flow chart of obtaining a first user service identity;
FIG. 7 is a flow diagram of a customization flow process;
FIG. 8 is a flow diagram illustrating inspection of a business customization flow process;
fig. 9 is a schematic diagram of the overall structure of the system of the present invention.
Detailed Description
The present invention will be described in further detail with reference to examples, which are illustrative of the present invention and are not to be construed as being limited thereto.
With the development of science and technology, the e-commerce has become the mainstream at present undoubtedly, and the e-commerce of each channel comes with the e-commerce, so that a merchant may select the e-commerce of multiple channels, and then a large amount of data needs to be imported and exported or imported into customer information, such as basic information, points, grades, membership cards and the like; in addition, a plurality of merchants are increasingly provided with off-line data and need to be communicated with a plurality of channels, different industry information can be generated in different channels, at present, due to the fact that a business mode of the business mode has a plurality of channels and industries, data volume is increased and complexity is high compared with a single mode, and all merchants have the requirement of data synchronization across channels, therefore, the data volume of the multi-channel processed in a short time is extremely large, and the operation is particularly troublesome in the current operation mode, so that the method is realized in the following mode:
example 1:
a data stream processing method, as shown in fig. 1, comprising the steps of:
s100, acquiring initial file information containing any external data, analyzing and generating initial service data;
s200, correcting the initial service data based on preset configuration information to obtain corrected service mode data;
s300, performing pipeline construction processing on the corrected service mode data based on standardized historical samples in a standardized historical sample library to obtain a multi-level pipeline model based on the corrected service mode data, wherein the multi-level pipeline model comprises a plurality of service pipelines;
s400, acquiring a data stream processing task request, and performing initialization processing on the data stream processing task request by combining preset configuration information to obtain a first user service identity;
s500, based on the multi-level pipeline model, the first user service identity is identified and matched to obtain an optimal service pipeline corresponding to the first user service identity;
s600, executing service customization flow processing based on the optimal service pipeline to obtain a uniform data model.
The invention can be actually regarded as a process of model training and model application:
the model training process comprises the steps of obtaining initial service data by obtaining any external data, and correcting channel information, service scene information and industry information of the initial service data layer by combining preset configuration information obtained by identifying a standardized historical sample library to obtain corrected service mode data; and training the correction service mode data by combining with a standardized historical sample library to obtain a multi-level pipeline model. In the model training process, any acquired external data is analyzed and then corrected, in the correction process, channel information, service scene information and industry information which are obtained by the standardized historical sample library are corrected in detail to obtain corrected service mode data with more accurate levels and types, then model training of a plurality of service pipelines is carried out by combining the standardized historical sample library, and finally a multi-level pipeline model is obtained.
The process of model application: the method comprises the steps of obtaining a data flow processing task request and analyzing to obtain a first user service identity, wherein a multi-level pipeline model comprises a plurality of service pipelines, and the plurality of user service identities and the plurality of service pipelines are in one-to-one correspondence, so that the first user service identity is identified and matched through the multi-level pipeline model to obtain an optimal service pipeline corresponding to the first user service identity, service customization flow processing is executed based on the optimal service pipeline to obtain a unified data model, the unified data model is a data model to be used continuously subsequently, and the data model can flow back to a standardized historical sample base subsequently. By the method, a unified data model can be obtained through training, and after the unified data model is obtained, merchants can communicate and fuse data of different channels. On the basis of data communication, closed-loop analysis of data can be carried out, various data of merchants can be communicated, and all-around analysis can be carried out subsequently, so that more favorable operation guidance opinions, various marketing reports and the like are provided for the merchants, and the merchants can manage the data of the merchants more conveniently.
In step S100, the acquiring and analyzing initial file information including any external data to generate initial service data includes the following steps, as shown in fig. 2:
s110, acquiring and analyzing initial file information to obtain service characteristics of an original file;
s120, identifying the initial file information based on the service characteristics to obtain file data, wherein the file data comprises channel data, service scene data and industry information data;
and S130, arranging and combining the file data to obtain initial service data.
Here, any external data may be data contained in various format initial files generated by different merchants or ERP systems, such as data carried by the files, such as Excel, csv, txt, and the like, and may include channel data, service scene data, industry information data, and the like, but is not presented according to a uniform format or mode, so that after the initial file information containing any external data is obtained, analysis needs to be performed to obtain the initial service data to be used by the present invention.
The business characteristics can be understood as the business characteristics embodied by the initial file information, and can be obtained by analyzing the existing business by a business expert or an expert system or by combining historical experience, and the initial file information is identified by combining the business characteristics, so that channel data, business scene data and industry information data of the initial file can be obtained, for example, if a platform dimension channel, an ERP dimension channel and a CRM manufacturer dimension channel exist, channel data corresponding to the channels exist, such as a small red book, a payment treasure, a Taobao, a tremble, a handsome, a win-win, a Lijing, a friend, a postscript and the like; if a service scenario is obtained, there is corresponding service scenario data, such as: members, stored value, orders, points, merchandise, etc.; if the business scene obtains the industry information, there is corresponding industry information data, such as: wine travel, fashion, baking, etc.
The permutation and combination is actually a service mode with fine granularity obtained by combining multiple dimensions, for example, after analyzing and then performing permutation and combination on certain initial file information, the following results are obtained: small red book, point, bake, and information about baked members.
In addition, the step may be understood as splitting the initial file information into a plurality of different kinds of data, and then combining the file data according to different dimensions or different rules to obtain the initial service data.
In step S200, the initial service data is corrected based on the preset configuration information to obtain corrected service mode data, as shown in fig. 3, the method includes the following steps:
s210, acquiring preset configuration information, wherein the preset configuration information at least comprises channel information, service scene information and industry information;
s220, correcting the initial service data based on the preset configuration information to obtain corrected service mode data.
Here, the preset configuration information may be obtained by acquiring a standardized historical sample library and identifying standardized historical sample information to obtain the preset configuration information, which may of course be implemented in other ways and is not described herein again;
for initial business data, the current requirements may not be met, so that the data can be corrected according to preset configuration information, for example, a file format or a data format capable of being processed, a mapping rule of a data field, a verification rule of a field, data application, closed-loop operation (data analysis, report form, suggestion) and the like are adjusted according to channel information, business scene information and industry information, and then corrected business mode data is obtained, for example, one baking merchant considers that a member number is a member, and other merchants consider that a member number and an identity card number are members. Or the member information of the merchant comprises the mobile phone number + the name, and the member information of the other merchant comprises the mobile phone number + the name + the gender, so that the initial service data needs to be corrected according to the preset configuration information to obtain the corrected service mode data with consistent rules.
In step S300, the pipeline construction processing is performed on the corrected service pattern data based on the standardized historical samples in the standardized historical sample library to obtain a multi-level pipeline model based on the corrected service pattern data, as shown in fig. 4, including the following steps:
s310a, classifying the standardized historical sample data in the standardized historical sample to obtain a multi-level service pipeline sample containing a plurality of user service identities;
s310b, performing a pipeline construction process on the corrected service pattern data based on the multi-layer service pipeline sample, to obtain a multi-layer pipeline model including a plurality of service pipelines, where the plurality of user service identities and the plurality of service pipelines have a one-to-one correspondence relationship.
Different merchants have a plurality of user service identities, and when a multi-level pipeline model is established, a plurality of service pipelines are established according to the plurality of user service identities, so that the multi-level pipeline model comprising the plurality of service pipelines is obtained.
In step S300, the pipeline construction processing is performed on the corrected service pattern data based on the normalized historical samples in the normalized historical sample library to obtain a multi-level pipeline model based on the corrected service pattern data, as shown in fig. 5, including the following steps:
s310, acquiring standardized historical samples in a standardized historical sample library, and extracting standardized historical sample information, wherein the standardized historical sample information comprises a plurality of user service identities;
s320, classifying the standardized historical sample information by combining preset configuration information to obtain a multi-level service pipeline sample containing the plurality of user service identities;
s330, performing multi-level classification processing on the corrected service mode data through preset configuration information to obtain multi-level corrected service mode data;
s340, building the multi-level correction service mode data step by step through the multi-level service pipeline sample to obtain a multi-level pipeline model comprising a plurality of service pipelines.
When the pipeline is constructed, the standardized historical sample information is classified to obtain a multi-level service pipeline sample comprising a plurality of user service identities, wherein the user service identities and the service pipelines have one-to-one correspondence. Meanwhile, the correction service mode data are classified (a classification model can be specifically used) to obtain multi-level correction service mode data, and finally, the multi-level correction service mode data are matched and constructed step by adopting a multi-level service pipeline sample to form a multi-level pipeline model. According to the method, the user service identity is split in a fine-grained manner, and a service pipeline is constructed for the service data, so that the universality and the matching accuracy of the data model can be improved.
In step S400, the acquiring a data stream processing task request, and performing initialization processing on the data processing task request in combination with preset configuration information to obtain a first user service identity, as shown in fig. 6, specifically:
s410, loading file information contained in the data stream processing task request based on the preset configuration information, and initializing the file information according to the preset configuration information to obtain first initialized file information;
and S420, analyzing the first initialization file information step by step to obtain a first user service identity.
The merchant or the user initiates a data stream processing task request, and needs to load some core parameter configurations and file information of the flow, so that the data stream processing task needs to be initialized, that is, corresponding preset configuration information is loaded for initialization, the file information is disassembled into small enough execution units according to a certain rule, and then the file information and the task dynamic variables are stored, wherein the initialization process is a process of preprocessing the contained file information according to the preset configuration information, and can be understood as 'classification or identification', for example, certain commodity or member information of a certain merchant in a certain e-commerce platform is identified and then stored after the identification. Therefore, the first initialization file information is data in a uniform format loaded according to the preset configuration information.
In step S500, the identifying and matching the first user service identity based on the multi-tier service pipeline model to obtain an optimal service pipeline corresponding to the first user service identity includes the following steps:
and matching the multi-level pipeline model based on the first user service identity, and selecting a service pipeline corresponding to the user service identity with the highest matching degree value, wherein the service pipeline corresponding to the user service identity with the highest matching degree value is the optimal service pipeline corresponding to the first user service identity.
Because the multi-level pipeline model comprises a plurality of service pipelines, and the plurality of service pipelines correspond to the plurality of user service identities one by one, when the multi-level pipeline model is matched, the first user service identity is matched with the plurality of service pipelines in the multi-level pipeline model, a plurality of matching results can appear, of course, the matching results can be presented by matching degree values, the service pipeline corresponding to the highest matching degree value is selected, and the service pipeline is used as an optimal service pipeline.
In step S600, as shown in fig. 7, the service customization flow process includes the following steps:
s610, analyzing the optimal service pipeline, and loading field data corresponding to the project according to the configured field mapping rule to obtain analyzed data;
s620, classifying the analyzed data according to the structured index data and the unstructured data, and respectively storing the classified data in a Mysql database and an HBase database;
s630, scanning the data, checking the type and the service, and after the checking is passed, performing aggregation and recombination on the related associated data to obtain recombined data;
and S640, converting the recombined data to obtain a uniform data model.
The process of business customization flow processing is a process of converting any external data into a uniform data model, the process of matching the optimal business pipeline is a process of data recombination and matching identification, and the obtained original file data is more and miscellaneous, so that the data can be recombined and further analyzed.
After obtaining the uniform data model, the merchant can perform communication and fusion processing on the data of different channels. On the basis of data communication, closed-loop analysis of data can be performed, various types of data of merchants can be communicated, and all-around analysis can be performed subsequently.
Specifically, as shown in fig. 8, the method further includes a step of polling the business customization process:
s650, whether the service customizing flow processing process is finished or not is detected based on the preset reasonable scheduling time delay;
s660, if the time is out, the process of continuously executing the service customizing flow is called again;
and S670, if the process of executing the service customization flow fails, automatically recording the fault position, establishing a special identifier, and then performing fault transfer recovery processing.
And (3) polling the service customization flow in real time, automatically performing failover recovery when the task fails to be executed: and recording the fault position of current execution failure and establishing a special identifier, and loading the marked service mode identification result and workflow execution until the data flow processing task is successfully executed or the upper limit of the execution times is reached.
When the data stream processing task starts to execute, according to the data volume of the data stream processing task (i.e. according to the throughput and the calculation performance and the experience of historical import, the recommended splitting magnitude of each task is generated, for example, the data volume is 800 ten thousand total data, and the data volume is split into 16 subtasks with the data volume of 50 ten thousand per part), a reasonable scheduling time is created, whether the business customizing flow processing process is finished or not is delayed and detected according to the preset reasonable scheduling time, if the time is out, the task needs to be called again to continue to execute the task, the delay detection is actually to predict the execution time of the main task based on the data volume and the execution speed of a certain subtask, since the process of executing the subtask needs a certain time, therefore, it is necessary to set a reasonable scheduling time to perform delay detection so as to detect whether the service customization process is completed.
The function of establishing the unique identifier is to play a role in the process of failover recovery processing, when the data stream processing task is restarted, the position of the unique identifier is inquired, and after the position is found, the subsequent process is executed on the subsequent task until the data stream processing task is successfully executed or the upper limit of the execution times is reached.
The concrete case is as follows:
a certain merchant has an offline member store and an online member store, and the data of the online and offline member store needs to be communicated at present, namely the construction of members of different channels of the same merchant is realized, so that the management of the merchant is facilitated, and the online and offline sales promotion activities are conveniently and uniformly carried out or coupons are uniformly issued.
Acquiring information of all online and offline original files, wherein the original files comprise data of a certain merchant and data of other merchants, processing and analyzing the online and offline data to obtain initial service data, and finely adjusting the initial service data by combining preset configuration information to obtain corrected service mode data;
carrying out pipeline construction on the corrected service mode data to obtain a multi-level pipeline model;
the method comprises the steps of obtaining a data stream processing task request of a certain merchant, carrying out initialization processing and analysis to obtain various user service identities of the certain merchant, obtaining the most appropriate service pipeline by the various user service identities according to a multi-level pipeline model, and then carrying out customized flow processing, namely, carrying out a series of processing on data, and then obtaining a unified data model. The data model is a trained data communication model and can be used by technicians of the department and corresponding merchants. And supplementing the models into a standardized historical sample library as samples.
It should be understood that although the various steps in the flow charts of fig. 1-8 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 1-8 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternating with other steps or at least some of the sub-steps or stages of other steps.
Example 2:
a data stream processing system, as shown in fig. 9, includes an acquisition and analysis module 100, a data correction module 200, a model construction module 300, an initialization and analysis module 400, an identification and matching module 500, and a customized flow processing module 600;
the acquiring and analyzing module 100 is configured to acquire and analyze initial file information including any external data, and generate initial service data;
the data correction module 200 corrects the initial service data based on preset configuration information to obtain corrected service mode data;
the model building module 300 is configured to perform pipeline building processing on the corrected service pattern data based on a standardized historical sample in a standardized historical sample library to obtain a multi-level pipeline model based on the corrected service pattern data, where the multi-level pipeline model includes a plurality of service pipelines;
the initialization analysis module 400 is configured to obtain a data stream processing task request, and perform initialization processing on the data stream processing task request in combination with preset configuration information to obtain a first user service identity;
the recognition matching module 500 performs recognition matching on the first user service identity based on the multi-level pipeline model to obtain an optimal service pipeline corresponding to the first user service identity;
the customized flow processing module 600 executes the service customized flow processing based on the optimal service pipeline to obtain a unified data model.
Example 3:
a computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method steps of:
s100, acquiring initial file information containing any external data, analyzing and generating initial service data;
s200, correcting the initial service data based on preset configuration information to obtain corrected service mode data;
s300, performing pipeline construction processing on the corrected service mode data based on standardized historical samples in a standardized historical sample library to obtain a multi-level pipeline model based on the corrected service mode data, wherein the multi-level pipeline model comprises a plurality of service pipelines;
s400, acquiring a data stream processing task request, and performing initialization processing on the data stream processing task request by combining preset configuration information to obtain a first user service identity;
s500, based on the multi-level pipeline model, identifying and matching the first user service identity to obtain an optimal service pipeline corresponding to the first user service identity;
s600, executing service customization flow processing based on the optimal service pipeline to obtain a uniform data model.
Standardized historical sample examples various embodiments in this specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among various embodiments can be referred to each other.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention has been described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing terminal to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing terminal to cause a series of operational steps to be performed on the computer or other programmable terminal to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It should be noted that:
reference in the specification to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the invention. Thus, the appearances of the phrase "one embodiment" or "an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment.
In addition, it should be noted that the specific embodiments described in the present specification may differ in the shape of the components, the names of the components, and the like. All equivalent or simple changes of the structure, the characteristics and the principle of the invention which are described in the patent conception of the invention are included in the protection scope of the patent of the invention. Various modifications, additions and substitutions for the specific embodiments described may be made by those skilled in the art without departing from the scope of the invention as defined in the accompanying claims.

Claims (12)

1. A method for processing a data stream, comprising the steps of:
acquiring initial file information containing any external data, analyzing the initial file information and generating initial service data;
correcting the initial service data based on preset configuration information to obtain corrected service mode data;
performing pipeline construction processing on the corrected service mode data based on standardized historical samples in a standardized historical sample library to obtain a multi-level pipeline model based on the corrected service mode data, wherein the multi-level pipeline model comprises a plurality of service pipelines;
acquiring a data stream processing task request, and performing initialization processing on the data stream processing task request by combining preset configuration information to obtain a first user service identity;
based on the multi-level pipeline model, the first user service identity is identified and matched to obtain an optimal service pipeline corresponding to the first user service identity;
and executing service customization flow processing based on the optimal service pipeline to obtain a uniform data model.
2. The data stream processing method according to claim 1, wherein the step of acquiring and analyzing initial file information including any external data to generate initial service data comprises the following steps:
acquiring and analyzing initial file information to obtain service characteristics of an original file;
identifying the initial file information based on the service characteristics to obtain file data, wherein the file data comprises channel data, service scene data and industry information data;
and arranging and combining the file data to obtain initial service data.
3. The data stream processing method according to claim 2, wherein the initial service data is corrected based on preset configuration information to obtain corrected service mode data, comprising the following steps:
acquiring preset configuration information, wherein the preset configuration information at least comprises channel information, service scene information and industry information;
and correcting the initial service data based on the preset configuration information to obtain corrected service mode data.
4. The data stream processing method according to claim 1, wherein the pipeline construction processing is performed on the corrected service pattern data based on the standardized historical samples in the standardized historical sample library to obtain a multi-level pipeline model based on the corrected service pattern data, and the method comprises the following steps:
classifying the standardized historical sample data in the standardized historical samples to obtain a multi-level service pipeline sample containing a plurality of user service identities;
and performing pipeline construction processing on the corrected service mode data based on the multi-layer service pipeline sample to obtain a multi-layer pipeline model comprising a plurality of service pipelines, wherein the plurality of user service identities and the plurality of service pipelines have one-to-one correspondence relationship.
5. The data stream processing method according to claim 4, wherein the pipeline construction processing is performed on the corrected service pattern data based on the standardized historical samples in the standardized historical sample library to obtain a multi-level pipeline model based on the corrected service pattern data, and the method comprises the following steps:
acquiring standardized historical samples in a standardized historical sample library, and extracting standardized historical sample information, wherein the standardized historical sample information comprises a plurality of user service identities;
classifying the standardized historical sample information by combining preset configuration information to obtain a multi-level service pipeline sample containing the plurality of user service identities;
carrying out multi-level classification processing on the corrected service mode data through preset configuration information to obtain multi-level corrected service mode data;
and constructing the multi-level correction service mode data step by step through the multi-level service pipeline sample to obtain a multi-level pipeline model comprising a plurality of service pipelines.
6. The data stream processing method according to claim 1, wherein the obtaining of the data stream processing task request and the performing of initialization processing on the data processing task request in combination with preset configuration information obtain a first user service identity specifically:
loading file information contained in the data stream processing task request based on the preset configuration information, and initializing the file information according to the preset configuration information to obtain first initialized file information;
and analyzing the first initialization file information step by step to obtain a first user service identity.
7. The data stream processing method according to claim 4 or 5, wherein the identifying and matching the first user service identity based on the multi-layer service pipeline model to obtain an optimal service pipeline corresponding to the first user service identity comprises the following steps:
and matching the multi-level pipeline model based on the first user service identity, and selecting a service pipeline corresponding to the user service identity with the highest matching degree value, wherein the service pipeline corresponding to the user service identity with the highest matching degree value is the optimal service pipeline corresponding to the first user service identity.
8. The data stream processing method according to claim 1, wherein the business customization flow process comprises the following steps:
analyzing the optimal service pipeline, and loading field data corresponding to the project according to the configured field mapping rule to obtain analyzed data;
classifying the analyzed data according to structured index data and unstructured data, and respectively storing the classified data in a Mysql database and an HBase database;
scanning the data, checking the type and the service, and after the check is passed, performing aggregation and recombination on the related associated data to obtain recombined data;
and converting the recombined data to obtain a unified data model.
9. The data stream processing method according to claim 1 or 8, further comprising a step of polling the business customization process:
detecting whether the service customizing flow processing process is finished or not based on preset reasonable scheduling time delay;
if yes, re-calling and continuing to execute the service customization flow processing process;
if the process of executing the business customizing flow fails, automatically recording the fault position, establishing a special identifier, and then performing the fault transfer recovery process.
10. A data flow processing system is characterized by comprising an acquisition and analysis module, a data correction module, a model construction module, an initialization and analysis module, an identification and matching module and a customized flow processing module;
the acquisition and analysis module is used for acquiring and analyzing initial file information containing any external data to generate initial service data;
the data correction module is used for correcting the initial service data based on preset configuration information to obtain corrected service mode data;
the model construction module is used for carrying out pipeline construction processing on the corrected service mode data based on standardized historical samples in a standardized historical sample library to obtain a multi-level pipeline model based on the corrected service mode data, wherein the multi-level pipeline model comprises a plurality of service pipelines;
the initialization analysis module is used for acquiring a data stream processing task request, and performing initialization processing on the data stream processing task request by combining preset configuration information to obtain a first user service identity;
the identification matching module is used for identifying and matching the first user service identity based on the multi-level pipeline model to obtain an optimal service pipeline corresponding to the first user service identity;
and the customization flow processing module executes service customization flow processing based on the optimal service pipeline to obtain a uniform data model.
11. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the data stream processing method according to any one of claims 1 to 9.
12. An electronic device comprising a memory, a processor and a computer program stored in the memory and run on the processor, characterized in that the processor realizes the steps of the data stream processing method according to any one of claims 1 to 9 when executing the computer program.
CN202210210419.4A 2022-03-04 2022-03-04 Data stream processing method, system, device and readable storage medium Pending CN114580536A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210210419.4A CN114580536A (en) 2022-03-04 2022-03-04 Data stream processing method, system, device and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210210419.4A CN114580536A (en) 2022-03-04 2022-03-04 Data stream processing method, system, device and readable storage medium

Publications (1)

Publication Number Publication Date
CN114580536A true CN114580536A (en) 2022-06-03

Family

ID=81774204

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210210419.4A Pending CN114580536A (en) 2022-03-04 2022-03-04 Data stream processing method, system, device and readable storage medium

Country Status (1)

Country Link
CN (1) CN114580536A (en)

Similar Documents

Publication Publication Date Title
CN110992167B (en) Bank customer business intention recognition method and device
CN108595157B (en) Block chain data processing method, device, equipment and storage medium
CN108805091B (en) Method and apparatus for generating a model
US11790676B2 (en) Artificial intelligence assisted warranty verification
US8799854B2 (en) Reusing software development assets
JP2023540150A (en) Systems and methods for automated application programming interface evaluation and migration
CN110442737A (en) The twin method and system of number based on chart database
US20100121668A1 (en) Automated compliance checking for process instance migration
CN111083013B (en) Test method and device based on flow playback, electronic equipment and storage medium
US10956914B2 (en) System and method for mapping a customer journey to a category
CN112286790A (en) Full link test method, device, equipment and storage medium
CN108075911B (en) Service testing method and device
CN112765014B (en) Automatic test system for multi-user simultaneous operation and working method
CN117874118A (en) Feature data conversion method, device, electronic equipment and readable storage medium
CN110177006B (en) Node testing method and device based on interface prediction model
CN114580536A (en) Data stream processing method, system, device and readable storage medium
CN113869989B (en) Information processing method and device
CN115439247A (en) Transaction data processing method and device
CN112419052B (en) Transaction testing method, device, electronic equipment and readable storage medium
CN110648219B (en) Method and device for standardizing input area of bank transaction system
CN114493850A (en) Artificial intelligence-based online notarization method, system and storage medium
CN112817574A (en) Variable data processing method, variable data processing device, electronic device, and storage medium
CN112799797A (en) Task management method and device
CN110852799A (en) User screening method and device based on intention label, electronic equipment and medium
CN113590488B (en) System test method and test platform for simulating financial data support

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination