CN113360558A - Data processing method, data processing device, electronic device, and storage medium - Google Patents

Data processing method, data processing device, electronic device, and storage medium Download PDF

Info

Publication number
CN113360558A
CN113360558A CN202110623555.1A CN202110623555A CN113360558A CN 113360558 A CN113360558 A CN 113360558A CN 202110623555 A CN202110623555 A CN 202110623555A CN 113360558 A CN113360558 A CN 113360558A
Authority
CN
China
Prior art keywords
data
service
strategy
cluster
data extraction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110623555.1A
Other languages
Chinese (zh)
Other versions
CN113360558B (en
Inventor
普辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Zhenshi Information Technology Co Ltd
Original Assignee
Beijing Jingdong Zhenshi Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Zhenshi Information Technology Co Ltd filed Critical Beijing Jingdong Zhenshi Information Technology Co Ltd
Priority to CN202110623555.1A priority Critical patent/CN113360558B/en
Publication of CN113360558A publication Critical patent/CN113360558A/en
Application granted granted Critical
Publication of CN113360558B publication Critical patent/CN113360558B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/08Logistics, e.g. warehousing, loading or distribution; Inventory or stock management
    • G06Q10/083Shipping
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The disclosure provides a data processing method, a data processing device, an electronic device and a computer readable storage medium, and belongs to the technical field of data processing. The method comprises the following steps: acquiring a data extraction strategy pre-configured for each service unit in a distributed service system, wherein the distributed service system comprises a plurality of clusters, and each service type in each cluster corresponds to one service unit; according to the data extraction strategy of each service unit, respectively extracting service data from the cluster and the service type corresponding to the data extraction strategy; and summarizing the extracted business data to a back-end data system. The method and the device can improve the pertinence and the effectiveness of data processing, and have strong configurability.

Description

Data processing method, data processing device, electronic device, and storage medium
Technical Field
The present disclosure relates to the field of data processing technologies, and in particular, to a data processing method, a data processing apparatus, an electronic device, and a computer-readable storage medium.
Background
With the development of big data technology, more and more big data clusters are used for processing business data of enterprises. For a system deployed in multiple clusters, when a backend data system needs some service data in the multiple clusters, the prior art often needs to adopt a mode of extracting and pushing the service data from the multiple clusters. However, in this method, the extraction flow of the service data is single, when a new service logic needs to be added or some special processing needs to be performed on the data, a new code needs to be added separately, the configuration flexibility is poor, and it is difficult to perform a simple and effective service processing flow according to actual needs.
It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present disclosure, and thus may include information that does not constitute prior art known to those of ordinary skill in the art.
Disclosure of Invention
The present disclosure provides a data processing method, a data processing apparatus, an electronic device, and a computer-readable storage medium, thereby overcoming, at least to some extent, poor configurability of data processing in the prior art.
Additional features and advantages of the disclosure will be set forth in the detailed description which follows, or in part will be obvious from the description, or may be learned by practice of the disclosure.
According to an aspect of the present disclosure, there is provided a data processing method including: acquiring a data extraction strategy pre-configured for each service unit in a distributed service system, wherein the distributed service system comprises a plurality of clusters, and each service type in each cluster corresponds to one service unit; according to the data extraction strategy of each service unit, respectively extracting service data from the cluster and the service type corresponding to the data extraction strategy; and summarizing the extracted business data to a back-end data system.
In an exemplary embodiment of the present disclosure, the extracting, according to the data extraction policy of each service unit, service data from a cluster and a service type corresponding to the data extraction policy respectively includes: and respectively extracting the service data from each cluster and each service type by executing the first timing task configured with the data extraction strategy.
In an exemplary embodiment of the present disclosure, the first timing task includes a plurality of subtasks, and each subtask has a data extraction policy configured therein; the extracting the service data from each cluster and each service type respectively by executing the first timing task configured with the data extraction strategy includes: and extracting the service data from the cluster and the service type corresponding to the data extraction strategy in each subtask by executing each subtask.
In an exemplary embodiment of the present disclosure, the extracted business data is stored in an intermediate data system; the summarizing the extracted service data to a back-end data system comprises: and pushing the service data in the intermediate data system to a back-end data system.
In an exemplary embodiment of the disclosure, the pushing the service data in the intermediate data system to a back-end data system includes: and pushing the service data in the intermediate data system to a back-end data system by executing a second timing task configured with a uniform data pushing strategy.
In an exemplary embodiment of the present disclosure, the extracting, according to the data extraction policy of each service unit, service data from a cluster and a service type corresponding to the data extraction policy respectively includes: when the data extraction strategy only comprises basic strategy information, extracting all service data in a period to be extracted from the cluster and the service type corresponding to the data extraction strategy; and when the data extraction strategy comprises basic strategy information and supplementary strategy information, extracting service data from the cluster and the service type corresponding to the data extraction strategy according to the supplementary strategy information.
In an exemplary embodiment of the present disclosure, the supplementary policy information includes any one or more of: time configuration information, trigger type configuration information, field configuration information, log management configuration information, scripting language configuration information, data source configuration information, and state configuration information.
According to an aspect of the present disclosure, there is provided a data processing apparatus including: the system comprises a strategy acquisition module, a strategy extraction module and a strategy extraction module, wherein the strategy acquisition module is used for acquiring a data extraction strategy which is pre-configured for each service unit in a distributed service system, the distributed service system comprises a plurality of clusters, and each service type in each cluster corresponds to one service unit; the data extraction module is used for extracting the service data from the clusters and the service types corresponding to the data extraction strategies according to the data extraction strategies of all the service units; and the data summarization module is used for summarizing the extracted service data to a back-end data system.
In an exemplary embodiment of the present disclosure, the data extraction module includes: and the first timing task execution unit is used for respectively extracting the service data from each cluster and each service type by executing the first timing task configured with the data extraction strategy.
In an exemplary embodiment of the present disclosure, the first timing task performing unit includes: and the subtask execution subunit is used for executing each subtask and extracting the service data from the cluster and the service type corresponding to the data extraction strategy in the subtask.
In an exemplary embodiment of the present disclosure, the extracted business data is stored in an intermediate data system; the data summarization module comprises: and the data summarization unit is used for pushing the service data in the intermediate data system to a back-end data system.
In an exemplary embodiment of the present disclosure, the data summarization unit includes: and the second timing task execution unit is used for pushing the service data in the intermediate data system to the back-end data system by executing the second timing task configured with the uniform data pushing strategy.
In an exemplary embodiment of the present disclosure, the data extraction module includes: the first extraction unit is used for extracting all service data in a period to be extracted from the cluster and the service type corresponding to the data extraction strategy when the data extraction strategy only comprises basic strategy information; and the second extraction unit is used for extracting the service data from the cluster and the service type corresponding to the data extraction strategy according to the supplementary strategy information when the data extraction strategy comprises the basic strategy information and the supplementary strategy information.
In an exemplary embodiment of the present disclosure, the supplementary policy information includes any one or more of: time configuration information, trigger type configuration information, field configuration information, log management configuration information, scripting language configuration information, data source configuration information, and state configuration information.
According to an aspect of the present disclosure, there is provided an electronic device including: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to perform the method of any one of the above via execution of the executable instructions.
According to an aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method of any one of the above.
Exemplary embodiments of the present disclosure have the following advantageous effects:
acquiring a data extraction strategy pre-configured for each service unit in a distributed service system, wherein the distributed service system comprises a plurality of clusters, and each service type in each cluster corresponds to one service unit; according to the data extraction strategy of each service unit, extracting service data from the cluster and the service type corresponding to the data extraction strategy respectively; and summarizing the extracted business data to a back-end data system. On one hand, the exemplary embodiment provides a new data processing method, which constructs a service unit through a cluster and a service type, extracts service data from the cluster and the service type based on different service units, can make clear the cluster source and the service type of the service data extraction on the basis of not increasing the complexity of a service data extraction process, and improves the pertinence and the effectiveness of the service data extraction; on the other hand, in the exemplary embodiment, a data extraction policy is configured for each service unit, and when data extraction is performed, a service data extraction process can be performed based on the data extraction policy of each service unit, so that required service data can be quickly and accurately extracted from different clusters and service types according to actual extraction requirements, personalization of service data extraction is improved, and configurability is strong.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure. It is to be understood that the drawings in the following description are merely exemplary of the disclosure, and that other drawings may be derived from those drawings by one of ordinary skill in the art without the exercise of inventive faculty.
Fig. 1 schematically shows a flow chart of a data processing method in the related art;
fig. 2 is a flow chart schematically showing another data processing method in the related art;
FIG. 3 schematically illustrates an operational architecture diagram of the present exemplary embodiment;
fig. 4 schematically shows a flowchart of a data processing method in the present exemplary embodiment;
FIG. 5 is a schematic diagram of another operational architecture of the exemplary embodiment;
fig. 6 schematically shows a flowchart of another data processing method in the present exemplary embodiment;
FIG. 7 is a diagram schematically illustrating a data extraction strategy in the present exemplary embodiment;
fig. 8 is a block diagram schematically showing the structure of a data processing apparatus in the present exemplary embodiment;
fig. 9 schematically illustrates an electronic device for implementing the above method in the present exemplary embodiment.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
In a data processing method of the related art, when data in a plurality of clusters 110 needs to be pushed to an upstream system, as shown in fig. 1, each cluster is usually required to be configured with a respective timing task 120, to extract summarized data from the respective cluster at a fixed time, and then to push the summarized data to the upstream system 130 in a message queue or call interface manner. However, in this way, each cluster needs to separately maintain its own timing task application for configuration, and the maintenance workload is large; meanwhile, each cluster pushes respective summarized data, and complexity of a data pushing consistency check process is increased.
In another data processing method in the related art, when data in a plurality of clusters 210 needs to be pushed to an upstream system, as shown in fig. 2, a first timing task 220 may be started first, the data of each cluster is collected into a middle database 230, and after the collection is completed, a second timing task 240 is started to scan the middle database 230 and push the data therein to the upstream system 250. Although in this way, each cluster does not need to separately maintain respective timing tasks, the summarized data flow is relatively single, and when a new service logic is required to be added or some special processing is performed on data, a code needs to be added separately, for example, when two points are initially set for data push and then data push needs to be performed with four or five points of time change, a code related to push time in a data push rule needs to be changed; for another example, data of some clusters or data sources is initially acquired, and then data of some of the clusters is desired to be acquired, modification of data source codes is also required, and the like, so that the configuration is not flexible and is poor in configurability.
In view of the above, exemplary embodiments of the present disclosure first provide a data processing method.
Fig. 3 is a system architecture diagram of the operating environment of the exemplary embodiment, and referring to fig. 3, the system 300 may include a cluster 310, an intermediate node 320, and a back-end data system 330. The cluster 310 may be a terminal or a server storing data in different regions or areas, and the intermediate node 320 is configured to extract and store service data from the cluster 310, and then push the service data to the backend data system 330.
It should be understood that the data for each of the devices shown in fig. 3 is merely exemplary, and any number of clusters or back-end data systems may be provided, as desired.
Based on the above description, the method in the present exemplary embodiment may be applied to the intermediate node 320 shown in fig. 3.
The present exemplary embodiment is further described with reference to fig. 4, and as shown in fig. 4, the data processing method may include the following steps S410 to S430:
step S410, a data extraction policy preconfigured for each service unit in a distributed service system is obtained, where the distributed service system includes a plurality of clusters, and each service type in each cluster corresponds to one service unit.
The distributed service system refers to a set including a plurality of clusters, where a cluster may be used to store service data of a corresponding area or range, for example, a beijing cluster is used to store part or all of service data of the area of beijing, and the service data may include data in a warehousing application scenario, and may also include service data in other application scenarios. Each cluster can be divided according to regions or ranges, and for example, the cluster can be a beijing cluster, a Shenzhen cluster or a Shanghai cluster; but also a collection of lakes, a collection of sunny regions, a collection of eastern regions, etc. The plurality of clusters can be regarded as a plurality of nodes for storing business data, for example, the beijing cluster stores the business data generated by the beijing region, the shenzhen cluster stores the business data generated by the shenzhen region, and the like.
Each cluster may include a large amount of data of different service types, for example, in a warehousing application scenario, the data may include service data of service types such as warehousing, classification, picking, packaging, and delivery, and the exemplary embodiment may form a binary group according to each cluster and a corresponding service type to construct different service units, for example, a service unit may be beijing cluster-picking data, beijing cluster-delivery data, or shenzhen cluster-picking data, and the like.
In this exemplary embodiment, in order to ensure flexibility and personalization of service data extraction in each cluster, a data extraction policy may be preconfigured for each service unit, and service data may be extracted from each service unit according to the obtained data extraction policy corresponding to each service unit. The data extraction policy refers to a data extraction rule, and in consideration of differences in data amount, data type, data requirements and the like in different clusters, different service units may configure different data extraction policies, for example, what time data are extracted, requirements when data are extracted, and the like, specifically, the data extraction policy may include time, a trigger, a field, and the like of service data to be extracted, which may be set by a user according to needs, and this disclosure is not particularly limited thereto. It should be noted that, according to actual needs, the data extraction policies of the service units of different service types in the same cluster or the service units of different service types in different clusters may be the same or different.
Step S420, according to the data extraction policy of each service unit, extracting service data from the cluster and the service type corresponding to the data extraction policy, respectively.
Further, the service data may be extracted from the service units based on the data extraction policies corresponding to different service units, so as to ensure that the target service data may be extracted according to the data extraction requirement from the specific service type in the corresponding cluster according to the actual need, for example, data of a specific field may be extracted from the beijing cluster-order picking data.
In an exemplary embodiment, the step S420 may include:
and respectively extracting the service data from each cluster and each service type by executing the first timing task configured with the data extraction strategy.
The first timing task refers to a configuration file for extracting service data from each service unit, and the configuration file can meet the requirement of periodically executing extraction of the service data. When the first timing task is triggered, the step of extracting the service data from each cluster and each service type can be executed.
In this exemplary embodiment, the intermediate node may configure the first timing task in a unified manner, and each cluster does not need to configure its own first timing task, which is beneficial to improving the maintenance efficiency of the cluster.
In an exemplary embodiment, the extracting the service data from each cluster and each service type by executing the first timing task configured with the data extraction policy may include:
and extracting the service data from the cluster and the service type corresponding to the data extraction strategy in the subtask by executing each subtask.
That is, in this exemplary embodiment, the first timing task may include subtasks for performing data extraction from different service units, each subtask may include a corresponding data extraction policy, and when the first timing task is triggered, different service units may actually execute respective subtasks according to respective corresponding data extraction policies, thereby ensuring that different clusters perform personalized configuration of service data extraction requirements.
Step S430, summarize the extracted service data to the backend data system.
That is, after extracting the service data, the exemplary embodiment may push the extracted service data to the backend data system. The back-end data system may refer to a data summarization system or an upstream system, for example, the business data of clusters in different regions of the e-commerce platform may be summarized to a server or a cluster of the e-commerce business headquarters system.
In an exemplary embodiment, the extracted service data may be stored in an intermediate data system, and further, the step S430 may include:
and pushing the service data in the intermediate data system to a back-end data system.
In this exemplary embodiment, an intermediate data system may be disposed between the cluster and the back-end data system, and configured to temporarily store the service data, and according to the data extraction policy of each service unit, after extracting the service data from the cluster and the service type corresponding to the data extraction policy, the service data may be stored in the intermediate data system first, and then pushed from the intermediate data system to the back-end data system, so as to buffer the pressure of data pushing, and facilitate management of the service data. In the present exemplary embodiment, the intermediate database may be a specific database, such as a Redis database; or ES (elastic search, search server), etc.
In an exemplary embodiment, the pushing the service data in the intermediate data system to the back-end data system may include:
the second timing task refers to a configuration file for pushing the service data to the backend data system, and can meet the requirement of periodically executing the pushing of the service data to the backend data system. And executing a second timing task configured with a uniform data pushing strategy to push the service data in the intermediate data system to the back-end data system at regular time. The second timing task is a different timing task than the first timing task.
Fig. 5 shows a schematic architecture diagram of another data processing method in this exemplary embodiment, when business data in a plurality of clusters 510 needs to be pushed to an upstream system, a first timing task 520 may be started first, according to a data extraction policy 530 of each business unit, business data is extracted from a cluster and a business type corresponding to the data extraction policy, and is stored in an intermediate database 540, after the summary is completed, a second timing task 550 is started, the intermediate database 540 is scanned, and the business data therein is pushed to the upstream system 560.
Based on the above description, in the present exemplary embodiment, a data extraction policy preconfigured for each service unit in a distributed service system is obtained, where the distributed service system includes a plurality of clusters, and each service type in each cluster corresponds to one service unit; according to the data extraction strategy of each service unit, extracting service data from the cluster and the service type corresponding to the data extraction strategy respectively; and summarizing the extracted business data to a back-end data system. On one hand, the exemplary embodiment provides a new data processing method, which constructs a service unit through a cluster and a service type, extracts service data from the cluster and the service type based on different service units, can make clear the cluster source and the service type of the service data extraction on the basis of not increasing the complexity of a service data extraction process, and improves the pertinence and the effectiveness of the service data extraction; on the other hand, in the exemplary embodiment, a data extraction policy is configured for each service unit, and when data extraction is performed, a service data extraction process can be performed based on the data extraction policy of each service unit, so that required service data can be quickly and accurately extracted from different clusters and service types according to actual extraction requirements, personalization of service data extraction is improved, and configurability is strong.
In an exemplary embodiment, the step S420 may include:
when the data extraction strategy only comprises basic strategy information, extracting all service data in a period to be extracted from the cluster and the service type corresponding to the data extraction strategy;
and when the data extraction strategy comprises basic strategy information and supplementary strategy information, extracting the service data from the cluster and the service type corresponding to the data extraction strategy according to the supplementary strategy information.
In this exemplary embodiment, the data extraction policy may include basic policy information and supplementary policy information, and information of two dimensions, where the basic policy information refers to cluster information and a service type, where the cluster information may be represented by an identifier, a code, a name, or the like of a cluster; the service type may also be identified by means of coding or identification, etc. The supplementary policy information refers to a more detailed data extraction rule than the basic policy information, such as a field or trigger setting for extracting traffic data, and the like.
The period to be extracted refers to an extraction time period for extracting the service data, and may be a previous time period of the current time period, for example, a day before the current day, a week before the current week, or the like. Therefore, the service data of the last time period can be extracted in the current time period to be analyzed or processed. In this exemplary embodiment, when the data extraction policy only includes the basic policy information, that is, it is determined from which clusters and which service types the service data needs to be extracted currently, further, all the service data in the period to be extracted may be extracted from the clusters and the service types corresponding to the data extraction policy.
When the data extraction policy includes the basic policy information and the supplementary policy information, on one hand, it may be determined from which clusters and which service types the service data needs to be extracted currently, and on the other hand, it is also determined that the requirement of more dimensions is specifically required when the service data is extracted.
In an exemplary embodiment, the supplementary policy information may include one or more of the following:
time configuration information, trigger type configuration information, field configuration information, log management configuration information, scripting language configuration information, data source configuration information, and state configuration information.
The time configuration information refers to information designed by a time rule. The start time, the end time, the timeout time and the like of the service data extraction task can be included. The task timeout time means that if the time exceeds the task execution period, the task is automatically abandoned, and the problem that the main logic is influenced, so that the server pressure is overlarge is avoided. The start time and the end time of the task may be used to specify an execution time period of the task, and the corresponding task may be executed only in a time period between the start time and the end time.
The trigger type configuration information refers to information of a trigger design. In the present exemplary embodiment, the triggers may include a data trigger and a task trigger. The trigger is configured with a code implementation class of a corresponding one of the applications. Considering from the triggering time, the triggering time of the data trigger is triggered when each piece of service data is summarized, the task trigger is triggered after all the data tasks are processed, for example, a back-end data system needs to collect service data of corresponding picking quantities and summary strategy tasks corresponding to the cluster a every day in all regional warehouses throughout the country, namely, in all regional clusters, and today, 1000 pieces of data are extracted. The data trigger is a trigger triggered once every time a piece of data is collected, and is implemented in an annotation manner from the implementation point of view. And after the data are summarized, the data are transmitted to the corresponding implementation method. From the perspective of an application scenario, the data trigger is adapted to process each piece of data, for example, each time one piece of data is collected, the piece of data may be processed or another field may be added. The task trigger is triggered after the 1000 pieces of data are summarized, and the task trigger can be applied to other service scenarios such as modifying the respective task state when each cluster finishes summarizing the data, or notifying the upstream that the cluster finishes summarizing. The exemplary embodiment may use the above data trigger or task trigger, or both data trigger and task trigger according to specific requirements.
The field configuration information is the field setting information. Which may contain a unique primary key and an ordering field. Wherein, the unique primary key is used for marking the uniqueness of each piece of service data. And if the business data is required to be processed at the later stage, the business data can be quickly inquired through the unique main key. And the sequencing field is used for marking the sequence of the service data.
The log management configuration information may include a system log and an error log. The system log can be used for recording the number of log records, main records and the data extracted by the task when the data extraction task is executed. And the error log can be used for recording error information when the data extraction task is executed in an error.
The script Language configuration information may be used to store a dynamic SQL (Structured Query Language) Language for extracting data, where the SQL is a specific SQL script, the script is a required data field, and may also perform conventional database function algorithms such as addition, subtraction, summation, and the like on some data, and these script languages may act on corresponding cluster databases. In the present exemplary embodiment, storage management is facilitated by setting the scripting language configuration information in a jfs (JOURNAL FILE SYSTEM, log file system) or other distributed file system.
The data source configuration information may be used to configure a specific database connection of the cluster, for example, to connect the cluster a, only the specific database connection of the cluster a needs to be configured, and the field may be a ciphertext.
The state configuration information refers to information of a task state of executing the data extraction task, and may include an initial state, an executing state, a success state, a failure state, and the like, by which the data extraction task can be effectively managed.
Fig. 6 shows a flowchart of another data processing method in the present exemplary embodiment, which may specifically include the following steps:
step S610, acquiring a data extraction strategy pre-configured for each service unit in a distributed service system, wherein the distributed service system comprises a plurality of clusters, and each service type in each cluster corresponds to one service unit;
step S620, when the data extraction strategy only comprises basic strategy information, all service data in the period to be extracted are extracted from the cluster and the service type corresponding to the data extraction strategy;
step S630, when the data extraction strategy includes the basic strategy information and the supplementary strategy information, extracting the service data from the cluster and the service type corresponding to the data extraction strategy according to the supplementary strategy information;
step S640, storing the extracted service data in an intermediate data system;
step S650, pushing the service data in the intermediate data system to the backend data system by executing the second timing task configured with the unified data pushing policy.
In the data extraction policy in step S610, as shown in fig. 7, the data extraction policy 700 may include base policy information 710 and supplementary policy information 720, the base policy information 710 may include cluster information 711 and a service type 712, and the supplementary policy information 720 may include time configuration information 721, trigger type configuration information 722, field configuration information 723, log management configuration information 724, scripting language configuration information 725, data source configuration information 726, and status configuration information 727.
An exemplary embodiment of the present disclosure also provides a data processing apparatus. Referring to fig. 8, the apparatus 800 may include a policy obtaining module 810, configured to obtain a data extraction policy preconfigured for each service unit in a distributed service system, where the distributed service system includes a plurality of clusters, and each service type in each cluster corresponds to one service unit; a data extraction module 820, configured to extract, according to the data extraction policy of each service unit, service data from the cluster and the service type corresponding to the data extraction policy, respectively; and a data summarization module 830 for summarizing the extracted service data to the back-end data system.
In an exemplary embodiment, the data extraction module includes: and the first timing task execution unit is used for respectively extracting the service data from each cluster and each service type by executing the first timing task configured with the data extraction strategy.
In an exemplary embodiment, the first timed task execution unit includes: and the subtask execution subunit is used for extracting the service data from the cluster and the service type corresponding to the data extraction strategy in the subtask by executing each subtask.
In an exemplary embodiment, the extracted business data is stored in an intermediate data system; the data summarization module comprises: and the data summarization unit is used for pushing the service data in the intermediate data system to the back-end data system.
In an exemplary embodiment, the data summarization unit comprises: and the second timing task execution unit is used for pushing the service data in the intermediate data system to the back-end data system by executing the second timing task configured with the uniform data pushing strategy.
In an exemplary embodiment, the data extraction module includes: the first extraction unit is used for extracting all service data in a period to be extracted from the cluster and the service type corresponding to the data extraction strategy when the data extraction strategy only comprises basic strategy information; and the second extraction unit is used for extracting the service data from the cluster and the service type corresponding to the data extraction strategy according to the supplementary strategy information when the data extraction strategy comprises the basic strategy information and the supplementary strategy information.
In an exemplary embodiment, the supplemental policy information includes any one or more of: time configuration information, trigger type configuration information, field configuration information, log management configuration information, scripting language configuration information, data source configuration information, and state configuration information.
The specific details of each module/unit in the above-mentioned apparatus have been described in detail in the embodiment of the method section, and the details that are not disclosed may refer to the contents of the embodiment of the method section, and therefore are not described herein again.
Exemplary embodiments of the present disclosure also provide an electronic device capable of implementing the above method.
As will be appreciated by one skilled in the art, aspects of the present disclosure may be embodied as a system, method or program product. Accordingly, various aspects of the present disclosure may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.
An electronic device 900 according to such an exemplary embodiment of the present disclosure is described below with reference to fig. 9. The electronic device 900 shown in fig. 9 is only an example and should not bring any limitations to the functionality or scope of use of the embodiments of the present disclosure.
As shown in fig. 9, the electronic device 900 is embodied in the form of a general purpose computing device. Components of electronic device 900 may include, but are not limited to: the at least one processing unit 910, the at least one storage unit 920, a bus 930 connecting different system components (including the storage unit 920 and the processing unit 910), and a display unit 940.
Where the storage unit stores program code, the program code may be executed by the processing unit 910 to cause the processing unit 910 to perform the steps according to various exemplary embodiments of the present disclosure described in the above-mentioned "exemplary methods" section of this specification. For example, the processing unit 910 may perform the steps shown in fig. 4 or fig. 6, and the like.
The storage unit 920 may include readable media in the form of volatile memory units, such as a random access memory unit (RAM)921 and/or a cache memory unit 922, and may further include a read only memory unit (ROM) 923.
Storage unit 920 may also include a program/utility 924 having a set (at least one) of program modules 925, such program modules 925 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
Bus 930 can be any of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.
The electronic device 900 may also communicate with one or more external devices 1000 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 900, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 900 to communicate with one or more other computing devices. Such communication may occur via input/output (I/O) interface 950. Also, the electronic device 900 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN) and/or a public network, such as the Internet) via the network adapter 960. As shown, the network adapter 960 communicates with the other modules of the electronic device 900 via the bus 930. It should be appreciated that although not shown, other hardware and/or software modules may be used in conjunction with the electronic device 900, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a terminal device, or a network device, etc.) to execute the method according to the exemplary embodiments of the present disclosure.
Exemplary embodiments of the present disclosure also provide a computer-readable storage medium having stored thereon a program product capable of implementing the above-described method of the present specification. In some possible embodiments, various aspects of the disclosure may also be implemented in the form of a program product comprising program code for causing a terminal device to perform the steps according to various exemplary embodiments of the disclosure described in the above-mentioned "exemplary methods" section of this specification, when the program product is run on the terminal device.
Exemplary embodiments of the present disclosure also provide a program product for implementing the above method, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present disclosure is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
A computer readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
Furthermore, the above-described figures are merely schematic illustrations of processes included in methods according to exemplary embodiments of the present disclosure, and are not intended to be limiting. It will be readily understood that the processes shown in the above figures are not intended to indicate or limit the chronological order of the processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, e.g., in multiple modules.
It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functions of two or more modules or units described above may be embodied in one module or unit according to an exemplary embodiment of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is to be limited only by the following claims.

Claims (10)

1. A data processing method, comprising:
acquiring a data extraction strategy pre-configured for each service unit in a distributed service system, wherein the distributed service system comprises a plurality of clusters, and each service type in each cluster corresponds to one service unit;
according to the data extraction strategy of each service unit, respectively extracting service data from the cluster and the service type corresponding to the data extraction strategy;
and summarizing the extracted business data to a back-end data system.
2. The method according to claim 1, wherein said extracting, according to the data extraction policy of each service unit, service data from the cluster and the service type corresponding to the data extraction policy respectively comprises:
and respectively extracting the service data from each cluster and each service type by executing the first timing task configured with the data extraction strategy.
3. The method of claim 2, wherein the first timing task comprises a plurality of subtasks, and each subtask has a data extraction policy configured therein; the extracting the service data from each cluster and each service type respectively by executing the first timing task configured with the data extraction strategy includes:
and extracting the service data from the cluster and the service type corresponding to the data extraction strategy in each subtask by executing each subtask.
4. The method of claim 1, wherein the extracted traffic data is stored in an intermediate data system; the summarizing the extracted service data to a back-end data system comprises:
and pushing the service data in the intermediate data system to a back-end data system.
5. The method of claim 4, wherein pushing the service data in the intermediate data system to a back-end data system comprises:
and pushing the service data in the intermediate data system to a back-end data system by executing a second timing task configured with a uniform data pushing strategy.
6. The method according to claim 1, wherein said extracting, according to the data extraction policy of each service unit, service data from the cluster and the service type corresponding to the data extraction policy respectively comprises:
when the data extraction strategy only comprises basic strategy information, extracting all service data in a period to be extracted from the cluster and the service type corresponding to the data extraction strategy;
and when the data extraction strategy comprises basic strategy information and supplementary strategy information, extracting service data from the cluster and the service type corresponding to the data extraction strategy according to the supplementary strategy information.
7. The method of claim 6, wherein the supplemental policy information comprises any one or more of:
time configuration information, trigger type configuration information, field configuration information, log management configuration information, scripting language configuration information, data source configuration information, and state configuration information.
8. A data processing apparatus, comprising:
the system comprises a strategy acquisition module, a strategy extraction module and a strategy extraction module, wherein the strategy acquisition module is used for acquiring a data extraction strategy which is pre-configured for each service unit in a distributed service system, the distributed service system comprises a plurality of clusters, and each service type in each cluster corresponds to one service unit;
the data extraction module is used for extracting the service data from the clusters and the service types corresponding to the data extraction strategies according to the data extraction strategies of all the service units;
and the data summarization module is used for summarizing the extracted service data to a back-end data system.
9. An electronic device, comprising:
a processor; and
a memory for storing executable instructions of the processor;
wherein the processor is configured to perform the method of any of claims 1-7 via execution of the executable instructions.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method of any one of claims 1 to 7.
CN202110623555.1A 2021-06-04 2021-06-04 Data processing method, data processing device, electronic equipment and storage medium Active CN113360558B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110623555.1A CN113360558B (en) 2021-06-04 2021-06-04 Data processing method, data processing device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110623555.1A CN113360558B (en) 2021-06-04 2021-06-04 Data processing method, data processing device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN113360558A true CN113360558A (en) 2021-09-07
CN113360558B CN113360558B (en) 2023-09-29

Family

ID=77532207

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110623555.1A Active CN113360558B (en) 2021-06-04 2021-06-04 Data processing method, data processing device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113360558B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113688159A (en) * 2021-09-08 2021-11-23 京东科技控股股份有限公司 Data extraction method and device

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104317928A (en) * 2014-10-31 2015-01-28 北京思特奇信息技术股份有限公司 Service ETL (extraction-transformation-loading) method and service ETL system both based on distributed database
CN104933119A (en) * 2015-06-05 2015-09-23 福建富士通信息软件有限公司 Big data management method
US20170124464A1 (en) * 2015-10-28 2017-05-04 Fractal Industries, Inc. Rapid predictive analysis of very large data sets using the distributed computational graph
CN109299177A (en) * 2018-09-30 2019-02-01 江苏满运软件科技有限公司 Data pick-up method, apparatus, storage medium and electronic equipment
CN109669989A (en) * 2018-12-29 2019-04-23 江苏满运软件科技有限公司 Data verification method, system, equipment and medium
CN109753596A (en) * 2018-12-29 2019-05-14 中国科学院计算技术研究所 Information source management and configuration method and system for the acquisition of large scale network data
CN109918437A (en) * 2019-03-08 2019-06-21 北京中油瑞飞信息技术有限责任公司 Distributed data processing method, apparatus and data assets management system
CN110889105A (en) * 2019-12-03 2020-03-17 中国工商银行股份有限公司 Data processing method, device, system and medium
CN112749219A (en) * 2021-01-04 2021-05-04 拉卡拉支付股份有限公司 Data extraction method, data extraction device, electronic equipment, storage medium and program product

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104317928A (en) * 2014-10-31 2015-01-28 北京思特奇信息技术股份有限公司 Service ETL (extraction-transformation-loading) method and service ETL system both based on distributed database
CN104933119A (en) * 2015-06-05 2015-09-23 福建富士通信息软件有限公司 Big data management method
US20170124464A1 (en) * 2015-10-28 2017-05-04 Fractal Industries, Inc. Rapid predictive analysis of very large data sets using the distributed computational graph
CN109299177A (en) * 2018-09-30 2019-02-01 江苏满运软件科技有限公司 Data pick-up method, apparatus, storage medium and electronic equipment
CN109669989A (en) * 2018-12-29 2019-04-23 江苏满运软件科技有限公司 Data verification method, system, equipment and medium
CN109753596A (en) * 2018-12-29 2019-05-14 中国科学院计算技术研究所 Information source management and configuration method and system for the acquisition of large scale network data
CN109918437A (en) * 2019-03-08 2019-06-21 北京中油瑞飞信息技术有限责任公司 Distributed data processing method, apparatus and data assets management system
CN110889105A (en) * 2019-12-03 2020-03-17 中国工商银行股份有限公司 Data processing method, device, system and medium
CN112749219A (en) * 2021-01-04 2021-05-04 拉卡拉支付股份有限公司 Data extraction method, data extraction device, electronic equipment, storage medium and program product

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李兰友;胡诚皓;张春华;: "ETL集群优化技术研究与实现", 电脑知识与技术, no. 13 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113688159A (en) * 2021-09-08 2021-11-23 京东科技控股股份有限公司 Data extraction method and device
CN113688159B (en) * 2021-09-08 2024-04-05 京东科技控股股份有限公司 Data extraction method and device

Also Published As

Publication number Publication date
CN113360558B (en) 2023-09-29

Similar Documents

Publication Publication Date Title
US11392416B2 (en) Automated reconfiguration of real time data stream processing
CN111339186B (en) Workflow engine data synchronization method, device, medium and electronic equipment
US11201936B2 (en) Input and output schema mappings
CN111309550A (en) Data acquisition method, system, equipment and storage medium of application program
CN111324610A (en) Data synchronization method and device
CN111241182A (en) Data processing method and apparatus, storage medium, and electronic apparatus
CN111880967A (en) File backup method, device, medium and electronic equipment in cloud scene
CN111400350B (en) Configuration data reading method, system, electronic device and storage medium
CN111221793A (en) Data mining method, platform, computer equipment and storage medium
CN112948486A (en) Batch data synchronization method and system and electronic equipment
US8914815B2 (en) Automated framework for tracking and maintaining kernel symbol list types
CN113360558A (en) Data processing method, data processing device, electronic device, and storage medium
CN111611479B (en) Data processing method and related device for network resource recommendation
CN106570152B (en) Mass extraction method and system for mobile phone numbers
CN110555064A (en) data service system and method for insurance business
CN113645260A (en) Service retry method, device, storage medium and electronic equipment
CN112579406A (en) Log call chain generation method and device
CN109586970B (en) Resource allocation method, device and system
CN112417015A (en) Data distribution method and device, storage medium and electronic device
CN111506646A (en) Data synchronization method, device, system, storage medium and processor
CN107330089B (en) Cross-network structured data collection system
US11816512B2 (en) Event driven data processing system and method
CN110750563A (en) Multi-model data processing method, system, device, electronic equipment and storage medium
US20230289354A1 (en) Endpoint scan and profile generation
CN115174527B (en) Sequence number processing method and device, computing equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant