CN113360558B - Data processing method, data processing device, electronic equipment and storage medium - Google Patents

Data processing method, data processing device, electronic equipment and storage medium Download PDF

Info

Publication number
CN113360558B
CN113360558B CN202110623555.1A CN202110623555A CN113360558B CN 113360558 B CN113360558 B CN 113360558B CN 202110623555 A CN202110623555 A CN 202110623555A CN 113360558 B CN113360558 B CN 113360558B
Authority
CN
China
Prior art keywords
data
service
data extraction
strategy
clusters
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110623555.1A
Other languages
Chinese (zh)
Other versions
CN113360558A (en
Inventor
普辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Zhenshi Information Technology Co Ltd
Original Assignee
Beijing Jingdong Zhenshi Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Zhenshi Information Technology Co Ltd filed Critical Beijing Jingdong Zhenshi Information Technology Co Ltd
Priority to CN202110623555.1A priority Critical patent/CN113360558B/en
Publication of CN113360558A publication Critical patent/CN113360558A/en
Application granted granted Critical
Publication of CN113360558B publication Critical patent/CN113360558B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/08Logistics, e.g. warehousing, loading or distribution; Inventory or stock management
    • G06Q10/083Shipping
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Economics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Marketing (AREA)
  • Development Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Human Resources & Organizations (AREA)
  • Computing Systems (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The disclosure provides a data processing method, a data processing device, electronic equipment and a computer readable storage medium, and belongs to the technical field of data processing. The method comprises the following steps: acquiring a data extraction strategy pre-configured for each service unit in a distributed service system, wherein the distributed service system comprises a plurality of clusters, and each service type in each cluster corresponds to one service unit; according to the data extraction strategies of each service unit, service data are extracted from clusters and service types corresponding to the data extraction strategies respectively; and summarizing the extracted business data to a back-end data system. The method and the device can improve pertinence and effectiveness of data processing, and are high in configurability.

Description

Data processing method, data processing device, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of data processing technologies, and in particular, to a data processing method, a data processing apparatus, an electronic device, and a computer readable storage medium.
Background
With the development of big data technology, more and more big data clusters are used for processing business data of enterprises. For systems deployed in multiple clusters, when a backend data system needs some service data in the multiple clusters, the prior art often needs to be implemented in a manner of extracting and pushing the service data from the multiple clusters. However, in this manner, the extraction flow of the service data is single, and when new service logic needs to be added or some special processing needs to be performed on the data, only new codes need to be added, so that the flexibility of configuration is poor, and it is difficult to simply and effectively perform the service processing flow according to actual needs.
It should be noted that the information disclosed in the above background section is only for enhancing understanding of the background of the present disclosure and thus may include information that does not constitute prior art known to those of ordinary skill in the art.
Disclosure of Invention
The present disclosure provides a data processing method, a data processing apparatus, an electronic device, and a computer-readable storage medium, so as to overcome the poor configurability of data processing in the prior art at least to some extent.
Other features and advantages of the present disclosure will be apparent from the following detailed description, or may be learned in part by the practice of the disclosure.
According to one aspect of the present disclosure, there is provided a data processing method including: acquiring a data extraction strategy pre-configured for each service unit in a distributed service system, wherein the distributed service system comprises a plurality of clusters, and each service type in each cluster corresponds to one service unit; according to the data extraction strategies of each service unit, service data are extracted from clusters and service types corresponding to the data extraction strategies respectively; and summarizing the extracted business data to a back-end data system.
In an exemplary embodiment of the present disclosure, according to the data extraction policy of each service unit, extracting service data from a cluster and a service type corresponding to the data extraction policy, respectively, includes: and extracting service data from each cluster and each service type respectively by executing a first timing task configured with the data extraction strategy.
In an exemplary embodiment of the present disclosure, the first timing task includes a plurality of sub-tasks, each of which is configured with a data extraction policy; the extracting service data from each cluster and each service type by executing the first timing task configured with the data extraction policy includes: and extracting service data from the clusters and service types corresponding to the data extraction strategies in the subtasks by executing each subtask.
In one exemplary embodiment of the present disclosure, the extracted business data is stored in an intermediate data system; the step of summarizing the extracted business data to a back-end data system comprises the following steps: and pushing the business data in the intermediate data system to a back-end data system.
In an exemplary embodiment of the present disclosure, the pushing the service data in the intermediate data system to the backend data system includes: and pushing the business data in the intermediate data system to a back-end data system by executing a second timing task configured with a unified data pushing strategy.
In an exemplary embodiment of the present disclosure, according to the data extraction policy of each service unit, extracting service data from a cluster and a service type corresponding to the data extraction policy, respectively, includes: when the data extraction strategy only comprises basic strategy information, extracting all service data in a period to be extracted from clusters and service types corresponding to the data extraction strategy; and when the data extraction strategy comprises basic strategy information and supplementary strategy information, extracting service data according to the supplementary strategy information from the cluster and service type corresponding to the data extraction strategy.
In an exemplary embodiment of the present disclosure, the supplemental policy information includes any one or more of: time configuration information, trigger type configuration information, field configuration information, log management configuration information, scripting language configuration information, data source configuration information, and state configuration information.
According to an aspect of the present disclosure, there is provided a data processing apparatus comprising: a policy obtaining module, configured to obtain a data extraction policy preconfigured for each service unit in a distributed service system, where the distributed service system includes a plurality of clusters, and each service type in each cluster corresponds to a service unit; the data extraction module is used for extracting service data from clusters and service types corresponding to the data extraction strategies according to the data extraction strategies of the service units; and the data summarizing module is used for summarizing the extracted business data to the back-end data system.
In one exemplary embodiment of the present disclosure, the data extraction module includes: and the first timing task execution unit is used for respectively extracting service data from each cluster and each service type by executing the first timing task configured with the data extraction strategy.
In one exemplary embodiment of the present disclosure, a first timed task execution unit includes: and the subtask execution subunit is used for extracting service data from the clusters and service types corresponding to the data extraction strategies in the subtasks by executing each subtask.
In one exemplary embodiment of the present disclosure, the extracted business data is stored in an intermediate data system; the data summarization module comprises: and the data summarizing unit is used for pushing the service data in the intermediate data system to the back-end data system.
In one exemplary embodiment of the present disclosure, a data summarization unit includes: and the second timing task execution unit is used for pushing the business data in the intermediate data system to the back-end data system by executing the second timing task configured with the unified data pushing strategy.
In one exemplary embodiment of the present disclosure, the data extraction module includes: the first extraction unit is used for extracting all business data in a period to be extracted from a cluster corresponding to the data extraction strategy and a business type when the data extraction strategy only comprises basic strategy information; and the second extraction unit is used for extracting service data according to the supplementary strategy information from the cluster and the service type corresponding to the data extraction strategy when the data extraction strategy comprises the basic strategy information and the supplementary strategy information.
In an exemplary embodiment of the present disclosure, the supplemental policy information includes any one or more of: time configuration information, trigger type configuration information, field configuration information, log management configuration information, scripting language configuration information, data source configuration information, and state configuration information.
According to one aspect of the present disclosure, there is provided an electronic device including: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to perform the method of any of the above via execution of the executable instructions.
According to one aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method of any one of the above.
Exemplary embodiments of the present disclosure have the following advantageous effects:
acquiring a data extraction strategy pre-configured for each service unit in a distributed service system, wherein the distributed service system comprises a plurality of clusters, and each service type in each cluster corresponds to one service unit; according to the data extraction strategy of each service unit, extracting service data from clusters and service types corresponding to the data extraction strategy; and summarizing the extracted business data to a back-end data system. On one hand, the present exemplary embodiment proposes a new data processing method, a service unit is constructed through a cluster and a service type, and service data is extracted from the cluster and the service type based on different service units, so that on the basis of not increasing the complexity of the service data extraction flow, the cluster source and the service type of the service data extraction can be clarified, and the pertinence and the effectiveness of the service data extraction are improved; on the other hand, the present exemplary embodiment configures a data extraction policy for each service unit, and when data extraction is performed, the service data extraction process may be performed based on the data extraction policy of each service unit, so that required service data can be rapidly and accurately extracted from different clusters and service types according to actual extraction requirements, thereby improving individuation of service data extraction, and having strong configurability.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure. It will be apparent to those of ordinary skill in the art that the drawings in the following description are merely examples of the disclosure and that other drawings may be derived from them without undue effort.
FIG. 1 schematically shows a flow chart of a data processing method in the related art;
FIG. 2 schematically shows a flow chart of another data processing method in the related art;
FIG. 3 schematically illustrates an operational architecture diagram of the present exemplary embodiment;
fig. 4 schematically shows a flowchart of a data processing method in the present exemplary embodiment;
FIG. 5 schematically illustrates another operational architecture diagram of the present exemplary embodiment;
fig. 6 schematically shows a flowchart of another data processing method in the present exemplary embodiment;
fig. 7 schematically shows a schematic diagram of a data extraction strategy in the present exemplary embodiment;
Fig. 8 is a block diagram schematically showing a structure of a data processing apparatus in the present exemplary embodiment;
fig. 9 schematically shows an electronic device for implementing the above method in the present exemplary embodiment.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments may be embodied in many forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
In one data processing method of the related art, when data in a plurality of clusters 110 needs to be pushed to an upstream system, as shown in fig. 1, each cluster is generally required to be configured with a respective timing task 120, summary data is periodically extracted from the respective clusters, and then the summary data is pushed to the upstream system 130 by a message queue or a call interface. However, in this manner, each cluster is required to independently maintain the respective timing task application, perform configuration, and have a large maintenance workload; meanwhile, each cluster pushes the summarized data, and the complexity of the data pushing consistency checking flow is increased.
In another related art data processing method, when data in a plurality of clusters 210 needs to be pushed to an upstream system, as shown in fig. 2, a first timing task 220 may be started first, data of each cluster is summarized in an intermediate database 230, after the summary is completed, a second timing task 240 is started again, the intermediate database 230 is scanned, and the data therein is pushed to the upstream system 250. Although each cluster does not need to independently maintain the respective timing task in this way, the summarized data flow is single, when new business logic is needed or some special processing is carried out on the data, codes are needed to be independently added, for example, when two points are initially set for data pushing, and then when the data pushing needs to be carried out at four or five points in a changing time, codes related to pushing time in a data pushing rule need to be changed; for another example, the initial acquisition of data of some clusters or data sources, and the later acquisition of data of some clusters needs modification of data source codes, which is inflexible and poorly configurable.
In view of the foregoing, exemplary embodiments of the present disclosure first provide a data processing method.
Fig. 3 illustrates a system architecture diagram of an operating environment of the present exemplary embodiment, and with reference to fig. 3, the system 300 may include a cluster 310, an intermediate node 320, and a back-end data system 330. The cluster 310 may be a terminal or a server storing data in different areas or regions, and the intermediate node 320 is configured to extract and store service data from the cluster 310, and push the service data to the back-end data system 330.
It should be appreciated that the data for each of the devices shown in fig. 3 is exemplary only, and that any number of clusters or backend data systems may be provided as desired.
Based on the above description, the method in the present exemplary embodiment may be applied to the intermediate node 320 shown in fig. 3.
The following describes the present exemplary embodiment with reference to fig. 4, and as shown in fig. 4, the data processing method may include the following steps S410 to S430:
in step S410, a data extraction policy preconfigured for each service unit in a distributed service system is obtained, where the distributed service system includes a plurality of clusters, and each service type in each cluster corresponds to a service unit.
The distributed service system refers to a set including a plurality of clusters, wherein the clusters can be used for storing service data of a corresponding area or range, for example, the Beijing clusters are used for storing part or all of service data of the Beijing area, and the service data can include data in a warehouse application scene and service data in other application scenes. Each cluster can be divided according to regions or ranges, for example, beijing clusters, shenzhen clusters or Shanghai clusters; but may also be a sea-lake region cluster, a sun-ward region cluster, an east city region cluster, etc. The plurality of clusters can be regarded as a plurality of nodes for storing service data, for example, the Beijing cluster stores service data generated in the Beijing area, the Shenzhen cluster stores service data generated in the Shenzhen area, and the like.
Each cluster may include a large amount of data with different service types, for example, in a warehouse application scenario, service data with service types including warehouse entry, classification, picking, packing, delivery, etc., and the present exemplary embodiment may form a binary group according to each cluster and the corresponding service type to construct different service units, for example, the service units may be Beijing cluster-picking data, beijing cluster-delivery data, or Shenjing cluster-picking data, etc.
In this exemplary embodiment, in order to ensure flexibility and individualization of service data extraction in each cluster, a data extraction policy may be preconfigured for each service unit, and service data may be extracted from each service unit according to the obtained data extraction policy corresponding to each service unit. The data extraction policy refers to a data extraction rule, and considering that the data amount, the data type, the data requirement and the like in different clusters all have differences, different service units can configure different data extraction policies, for example, when data is extracted, the requirement when the data is extracted, etc., specifically, the data extraction policies can include the time, the trigger, the field and the like of the service data required to be extracted, and the data extraction policies can be customized according to the needs, which is not specifically limited in this disclosure. It should be noted that, according to actual needs, service units of different service types in the same cluster, or data extraction strategies of service units of different service types in different clusters may be the same or different.
Step S420, according to the data extraction strategy of each service unit, service data is extracted from the clusters and service types corresponding to the data extraction strategy.
Further, service data can be extracted from service units based on data extraction strategies corresponding to different service units, so as to ensure that target service data can be extracted according to data extraction requirements from specific service types in corresponding clusters according to actual needs, for example, data of specific fields are extracted from Beijing cluster-order picking data, and the like.
In an exemplary embodiment, the step S420 may include:
business data are extracted from each cluster and each business type respectively by executing a first timing task configured with a data extraction strategy.
The first timing task refers to a configuration file for extracting service data from each service unit, and the configuration file can meet the requirement of periodically executing the operation of extracting the service data. When the first timing task is triggered, the step of extracting service data from each cluster and each service type may be performed.
In the present exemplary embodiment, the intermediate node may uniformly configure the first timing tasks, without configuring each cluster with a respective first timing task, which is beneficial to improving the maintenance efficiency of the clusters.
In an exemplary embodiment, the first timing task may include a plurality of sub-tasks, each of which may be configured with a data extraction policy, and the extracting service data from each cluster and each service type by executing the first timing task configured with the data extraction policy may include:
and extracting service data from the clusters and service types corresponding to the data extraction strategies in the subtasks by executing each subtask.
That is, in this exemplary embodiment, the first timing task may include sub-tasks for performing data extraction from different service units, where each sub-task may include a corresponding data extraction policy, and when the first timing task is triggered, actually, different service units may execute respective sub-tasks according to respective corresponding data extraction policies, so as to ensure personalized configuration of service data extraction requirements performed by different clusters.
Step S430, summarizing the extracted business data to a back-end data system.
That is, the present exemplary embodiment may push the extracted service data to the back-end data system after extracting the service data. The back-end data system may be a data summarizing system or an upstream system, for example, service data of clusters in different areas of the e-commerce platform may be summarized into a server or a cluster of the e-commerce headquarter system.
In an exemplary embodiment, the extracted service data may be stored in an intermediate data system, and further, the step S430 may include:
and pushing the business data in the intermediate data system to the back-end data system.
According to the embodiment, an intermediate data system can be arranged between the cluster and the back-end data system and used for temporarily storing service data, after the service data is extracted from the cluster corresponding to the data extraction strategy and the service type according to the data extraction strategy of each service unit, the service data can be stored in the intermediate data system and then pushed to the back-end data system from the intermediate data system, so that the data pushing pressure is buffered, and the service data can be managed conveniently. In the present exemplary embodiment, the intermediate database may be a specific database, such as a Redis database; or ES (search server), etc.
In an exemplary embodiment, the pushing the service data in the intermediate data system to the backend data system may include:
the second timing task refers to a configuration file for pushing service data to the back-end data system, which can meet the requirement of periodically executing the pushing of service data to the back-end data system. By executing the second timing task configured with the unified data push policy, traffic data in the intermediate data system can be pushed to the backend data system at regular time. The second timed task is of a different timed task than the first timed task.
Fig. 5 shows a schematic architecture diagram of another data processing method in this exemplary embodiment, when service data in a plurality of clusters 510 needs to be pushed to an upstream system, a first timing task 520 may be started first, service data is extracted from clusters and service types corresponding to the data extraction policies according to the data extraction policies 530 of each service unit, and stored in an intermediate database 540, after the summary is completed, a second timing task 550 is started again, and the intermediate database 540 is scanned to push the service data therein to the upstream system 560.
Based on the above description, in the present exemplary embodiment, a data extraction policy preconfigured for each service unit in a distributed service system including a plurality of clusters, each service type in each cluster corresponding to one service unit, is acquired; according to the data extraction strategy of each service unit, extracting service data from clusters and service types corresponding to the data extraction strategy; and summarizing the extracted business data to a back-end data system. On one hand, the present exemplary embodiment proposes a new data processing method, a service unit is constructed through a cluster and a service type, and service data is extracted from the cluster and the service type based on different service units, so that on the basis of not increasing the complexity of the service data extraction flow, the cluster source and the service type of the service data extraction can be clarified, and the pertinence and the effectiveness of the service data extraction are improved; on the other hand, the present exemplary embodiment configures a data extraction policy for each service unit, and when data extraction is performed, the service data extraction process may be performed based on the data extraction policy of each service unit, so that required service data can be rapidly and accurately extracted from different clusters and service types according to actual extraction requirements, thereby improving individuation of service data extraction, and having strong configurability.
In an exemplary embodiment, the step S420 may include:
when the data extraction strategy only comprises basic strategy information, extracting all service data in a period to be extracted from clusters and service types corresponding to the data extraction strategy;
and when the data extraction strategy comprises basic strategy information and supplementary strategy information, extracting service data according to the supplementary strategy information from the cluster and service type corresponding to the data extraction strategy.
In this exemplary embodiment, the data extraction policy may include basic policy information and supplementary policy information, where the basic policy information refers to cluster information and service type, and the cluster information may be represented by an identifier, a code, or a name of a cluster; the traffic type may also be identified by means of coding or identification, etc. The supplementary policy information refers to more detailed data extraction rules, such as a field or trigger setting for extracting service data, etc., than the basic policy information.
The period to be extracted refers to an extraction period for extracting service data, and may be a period of time immediately preceding the current period of time, for example, a day before the current day, or a week before the current week. Therefore, the business data of the previous time period can be extracted in the current time period for analysis or processing. In this exemplary embodiment, when the data extraction policy includes only basic policy information, it is determined which clusters and which service types are currently required to perform service data extraction, and further, all service data in the period to be extracted may be extracted from the clusters and service types corresponding to the data extraction policy.
When the data extraction policy includes the basic policy information and the supplementary policy information, on the one hand, it can be determined from which clusters and which service types the service data extraction is currently required, and on the other hand, the requirement of more dimensions when the service data extraction is specifically performed is also determined.
In an exemplary embodiment, the supplemental policy information may include one or more of the following:
time configuration information, trigger type configuration information, field configuration information, log management configuration information, scripting language configuration information, data source configuration information, and state configuration information.
The time configuration information refers to information of time rule design. May include a start time, an end time, a timeout time, etc. of the traffic data extraction task. The task timeout time refers to the problem that if the task execution period exceeds the time, the task is automatically abandoned, so that the main logic is prevented from being influenced, and the server is excessively stressed. The start time and the end time of a task may be used to specify a time period for execution of the task, and the corresponding task may be executed only during the time period between the start time and the end time.
The trigger type configuration information refers to information of trigger design. In the present exemplary embodiment, the triggers may include a data trigger and a task trigger. The trigger configures the code implementation class of the corresponding one of the applications. From the perspective of triggering time, the triggering time of the data trigger is triggered when each piece of service data is summarized, and the task trigger is triggered after all processing of one data task is completed, for example, a back-end data system needs daily warehouse in each region of the whole country, namely, each region cluster, corresponding service data of picking quantity, and a total of 1000 pieces of data are extracted today by a summary strategy task corresponding to the cluster A. The data trigger is triggered once when one piece of data is summarized, and is realized in an annotation mode from the viewpoint of realization. And after each piece of data is summarized, the data is transmitted to a corresponding implementation class method. From the application view, the data trigger is suitable for processing each piece of data, for example, each piece of data is summarized, and the piece of data can be processed or other fields can be added. The task trigger is triggered after the completion of the summary of 1000 pieces of data, and can be applied to modifying the respective task state when each cluster finishes the summary of data, or notifying the upstream of other business scenes such as the summary of the cluster. The present exemplary embodiment may employ the above-described data triggers or task triggers, or the data triggers and task triggers, according to specific requirements.
The field configuration information is field setting information. Which may contain a unique primary key and an ordering field. Wherein a unique primary key is used to mark the uniqueness of each piece of business data. And if the service data is required to be processed in the later period, the service data can be quickly queried through the unique primary key. The order field is the order used to mark the traffic data.
The log management configuration information may include a system log and an error log. The system log can be used for recording log records, main records and the quantity of data extracted by the task when the data extraction task is executed. An error log may be used to record error information when the data extraction task performed an error.
The script language configuration information can be used for storing dynamic SQL (Structured Query Language, structured language) language for extracting data, the SQL is a specific SQL script, the script is a required data field, and conventional database function algorithms such as addition, subtraction, summation and the like of some data can be performed, and the script language can act on a corresponding cluster database. In the present exemplary embodiment, storage management is facilitated by setting scripting language configuration information in jfs (JOURNAL FILE SYSTEM, log file system) or another distributed file system.
The data source configuration information may be used to configure a specific database connection of the cluster, for example, to connect to the cluster a, and only the specific database connection of the cluster a needs to be configured, and the field may be ciphertext.
The state configuration information refers to information of task states of executing the data extraction task, and may include an initial state, an executing state, a success state, a failure state, and the like, through which the data extraction task can be effectively managed.
Fig. 6 shows a flowchart of another data processing method in the present exemplary embodiment, which may specifically include the following steps:
step S610, obtaining a data extraction strategy preconfigured for each service unit in a distributed service system, wherein the distributed service system comprises a plurality of clusters, and each service type in each cluster corresponds to one service unit;
step S620, when the data extraction strategy only comprises basic strategy information, extracting all service data in a period to be extracted from clusters and service types corresponding to the data extraction strategy;
step S630, when the data extraction strategy comprises basic strategy information and supplementary strategy information, extracting service data according to the supplementary strategy information from the cluster and service type corresponding to the data extraction strategy;
Step S640, the extracted business data is stored in an intermediate data system;
in step S650, the service data in the intermediate data system is pushed to the back-end data system by executing the second timing task configured with the unified data push policy.
The data extraction policy in step S610, as shown in fig. 7, the data extraction policy 700 may include basic policy information 710 and supplementary policy information 720, the basic policy information 710 may include cluster information 711 and service type 712, the supplementary policy information 720 may include time configuration information 721, trigger type configuration information 722, field configuration information 723, log management configuration information 724, scripting language configuration information 725, data source configuration information 726, and status configuration information 727.
Exemplary embodiments of the present disclosure also provide a data processing apparatus. Referring to fig. 8, the apparatus 800 may include a policy acquisition module 810 configured to acquire a data extraction policy preconfigured for each service unit in a distributed service system including a plurality of clusters, each service type in each cluster corresponding to a service unit; the data extraction module 820 is configured to extract service data from the clusters and service types corresponding to the data extraction policies according to the data extraction policies of each service unit; and the data summarizing module 830 is configured to summarize the extracted service data to a back-end data system.
In an exemplary embodiment, the data extraction module includes: and the first timing task execution unit is used for respectively extracting service data from each cluster and each service type by executing the first timing task configured with the data extraction strategy.
In an exemplary embodiment, the first timed task execution unit includes: and the subtask execution subunit is used for extracting service data from the clusters and service types corresponding to the data extraction strategies in the subtasks by executing each subtask.
In an exemplary embodiment, the extracted business data is stored in an intermediate data system; the data summarization module comprises: and the data summarizing unit is used for pushing the business data in the intermediate data system to the back-end data system.
In an exemplary embodiment, the data summarization unit includes: and the second timing task execution unit is used for pushing the business data in the intermediate data system to the back-end data system by executing the second timing task configured with the unified data pushing strategy.
In an exemplary embodiment, the data extraction module includes: the first extraction unit is used for extracting all business data in a period to be extracted from clusters and business types corresponding to the data extraction strategy when the data extraction strategy only comprises basic strategy information; and the second extraction unit is used for extracting service data according to the supplementary strategy information from the cluster and the service type corresponding to the data extraction strategy when the data extraction strategy comprises the basic strategy information and the supplementary strategy information.
In an exemplary embodiment, the supplemental policy information includes any one or more of the following: time configuration information, trigger type configuration information, field configuration information, log management configuration information, scripting language configuration information, data source configuration information, and state configuration information.
The specific details of each module/unit in the above apparatus are already described in the embodiments of the method section, and the details not disclosed can be found in the embodiments of the method section, so that they will not be described here again.
The exemplary embodiments of the present disclosure also provide an electronic device capable of implementing the above method.
Those skilled in the art will appreciate that the various aspects of the present disclosure may be implemented as a system, method, or program product. Accordingly, various aspects of the disclosure may be embodied in the following forms, namely: an entirely hardware embodiment, an entirely software embodiment (including firmware, micro-code, etc.) or an embodiment combining hardware and software aspects may be referred to herein as a "circuit," module "or" system.
An electronic device 900 according to such an exemplary embodiment of the present disclosure is described below with reference to fig. 9. The electronic device 900 shown in fig. 9 is merely an example and should not be construed to limit the functionality and scope of use of embodiments of the present disclosure in any way.
As shown in fig. 9, the electronic device 900 is embodied in the form of a general purpose computing device. Components of electronic device 900 may include, but are not limited to: the at least one processing unit 910, the at least one storage unit 920, a bus 930 connecting the different system components (including the storage unit 920 and the processing unit 910), and a display unit 940.
Wherein the storage unit stores program code that is executable by the processing unit 910 such that the processing unit 910 performs steps according to various exemplary embodiments of the present disclosure described in the above-described "exemplary methods" section of the present specification. For example, the processing unit 910 may perform the steps shown in fig. 4 or fig. 6, and the like.
The storage unit 920 may include readable media in the form of volatile storage units, such as Random Access Memory (RAM) 921 and/or cache memory 922, and may further include Read Only Memory (ROM) 923.
The storage unit 920 may also include a program/utility 924 having a set (at least one) of program modules 925, such program modules 925 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.
The bus 930 may be one or more of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.
The electronic device 900 may also communicate with one or more external devices 1000 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 900, and/or with any device (e.g., router, modem, etc.) that enables the electronic device 900 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 950. Also, electronic device 900 may communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet, through network adapter 960. As shown, the network adapter 960 communicates with other modules of the electronic device 900 over the bus 930. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with electronic device 900, including, but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.
From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or may be implemented in software in combination with the necessary hardware. Thus, the technical solutions according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.) or on a network, including several instructions to cause a computing device (may be a personal computer, a server, a terminal device, or a network device, etc.) to perform the method according to the exemplary embodiments of the present disclosure.
Exemplary embodiments of the present disclosure also provide a computer readable storage medium having stored thereon a program product capable of implementing the method described above in the present specification. In some possible implementations, various aspects of the disclosure may also be implemented in the form of a program product comprising program code for causing a terminal device to carry out the steps according to the various exemplary embodiments of the disclosure as described in the "exemplary methods" section of this specification, when the program product is run on the terminal device.
Exemplary embodiments of the present disclosure also provide a program product for implementing the above method, which may employ a portable compact disc read-only memory (CD-ROM) and comprise program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present disclosure is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The computer readable signal medium may include a data signal propagated in baseband or as part of a carrier wave with readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).
Furthermore, the above-described figures are only schematic illustrations of processes included in the method according to the exemplary embodiments of the present disclosure, and are not intended to be limiting. It will be readily appreciated that the processes shown in the above figures do not indicate or limit the temporal order of these processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, for example, among a plurality of modules.
It should be noted that although in the above detailed description several modules or units of a device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit in accordance with exemplary embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into a plurality of modules or units to be embodied.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (9)

1. A method of data processing, comprising:
acquiring a data extraction strategy pre-configured for each service unit in a distributed service system, wherein the distributed service system comprises a plurality of clusters, and each service type in each cluster corresponds to one service unit;
according to the data extraction strategies of each service unit, service data are extracted from clusters and service types corresponding to the data extraction strategies respectively;
summarizing the extracted business data to a back-end data system;
the extracting service data from the clusters and service types corresponding to the data extraction policies according to the data extraction policies of the service units respectively includes:
when the data extraction strategy only comprises basic strategy information, extracting all service data in a period to be extracted from clusters and service types corresponding to the data extraction strategy;
and when the data extraction strategy comprises basic strategy information and supplementary strategy information, extracting service data according to the supplementary strategy information from the cluster and service type corresponding to the data extraction strategy.
2. The method according to claim 1, wherein the extracting service data from the clusters and service types corresponding to the data extraction policies according to the data extraction policies of the service units respectively includes:
and extracting service data from each cluster and each service type respectively by executing a first timing task configured with the data extraction strategy.
3. The method of claim 2, wherein the first timing task comprises a plurality of sub-tasks, each of the sub-tasks having a data extraction policy configured therein; the extracting service data from each cluster and each service type by executing the first timing task configured with the data extraction policy includes:
and extracting service data from the clusters and service types corresponding to the data extraction strategies in the subtasks by executing each subtask.
4. The method of claim 1, wherein the extracted service data is stored in an intermediate data system; the step of summarizing the extracted business data to a back-end data system comprises the following steps:
and pushing the business data in the intermediate data system to a back-end data system.
5. The method of claim 4, wherein pushing the traffic data in the intermediate data system to the back-end data system comprises:
and pushing the business data in the intermediate data system to a back-end data system by executing a second timing task configured with a unified data pushing strategy.
6. The method of claim 1, wherein the supplemental policy information includes any one or more of:
time configuration information, trigger type configuration information, field configuration information, log management configuration information, scripting language configuration information, data source configuration information, and state configuration information.
7. A data processing apparatus, comprising:
a policy obtaining module, configured to obtain a data extraction policy preconfigured for each service unit in a distributed service system, where the distributed service system includes a plurality of clusters, and each service type in each cluster corresponds to a service unit;
the data extraction module is used for extracting service data from clusters and service types corresponding to the data extraction strategies according to the data extraction strategies of the service units;
The data summarizing module is used for summarizing the extracted business data to a back-end data system;
a data extraction module configured to:
when the data extraction strategy only comprises basic strategy information, extracting all service data in a period to be extracted from clusters and service types corresponding to the data extraction strategy;
and when the data extraction strategy comprises basic strategy information and supplementary strategy information, extracting service data according to the supplementary strategy information from the cluster and service type corresponding to the data extraction strategy.
8. An electronic device, comprising:
a processor; and
a memory for storing executable instructions of the processor;
wherein the processor is configured to perform the method of any of claims 1-6 via execution of the executable instructions.
9. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the method of any of claims 1-6.
CN202110623555.1A 2021-06-04 2021-06-04 Data processing method, data processing device, electronic equipment and storage medium Active CN113360558B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110623555.1A CN113360558B (en) 2021-06-04 2021-06-04 Data processing method, data processing device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110623555.1A CN113360558B (en) 2021-06-04 2021-06-04 Data processing method, data processing device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN113360558A CN113360558A (en) 2021-09-07
CN113360558B true CN113360558B (en) 2023-09-29

Family

ID=77532207

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110623555.1A Active CN113360558B (en) 2021-06-04 2021-06-04 Data processing method, data processing device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113360558B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113688159B (en) * 2021-09-08 2024-04-05 京东科技控股股份有限公司 Data extraction method and device

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104317928A (en) * 2014-10-31 2015-01-28 北京思特奇信息技术股份有限公司 Service ETL (extraction-transformation-loading) method and service ETL system both based on distributed database
CN104933119A (en) * 2015-06-05 2015-09-23 福建富士通信息软件有限公司 Big data management method
CN109299177A (en) * 2018-09-30 2019-02-01 江苏满运软件科技有限公司 Data pick-up method, apparatus, storage medium and electronic equipment
CN109669989A (en) * 2018-12-29 2019-04-23 江苏满运软件科技有限公司 Data verification method, system, equipment and medium
CN109753596A (en) * 2018-12-29 2019-05-14 中国科学院计算技术研究所 Information source management and configuration method and system for the acquisition of large scale network data
CN109918437A (en) * 2019-03-08 2019-06-21 北京中油瑞飞信息技术有限责任公司 Distributed data processing method, apparatus and data assets management system
CN110889105A (en) * 2019-12-03 2020-03-17 中国工商银行股份有限公司 Data processing method, device, system and medium
CN112749219A (en) * 2021-01-04 2021-05-04 拉卡拉支付股份有限公司 Data extraction method, data extraction device, electronic equipment, storage medium and program product

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170124464A1 (en) * 2015-10-28 2017-05-04 Fractal Industries, Inc. Rapid predictive analysis of very large data sets using the distributed computational graph

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104317928A (en) * 2014-10-31 2015-01-28 北京思特奇信息技术股份有限公司 Service ETL (extraction-transformation-loading) method and service ETL system both based on distributed database
CN104933119A (en) * 2015-06-05 2015-09-23 福建富士通信息软件有限公司 Big data management method
CN109299177A (en) * 2018-09-30 2019-02-01 江苏满运软件科技有限公司 Data pick-up method, apparatus, storage medium and electronic equipment
CN109669989A (en) * 2018-12-29 2019-04-23 江苏满运软件科技有限公司 Data verification method, system, equipment and medium
CN109753596A (en) * 2018-12-29 2019-05-14 中国科学院计算技术研究所 Information source management and configuration method and system for the acquisition of large scale network data
CN109918437A (en) * 2019-03-08 2019-06-21 北京中油瑞飞信息技术有限责任公司 Distributed data processing method, apparatus and data assets management system
CN110889105A (en) * 2019-12-03 2020-03-17 中国工商银行股份有限公司 Data processing method, device, system and medium
CN112749219A (en) * 2021-01-04 2021-05-04 拉卡拉支付股份有限公司 Data extraction method, data extraction device, electronic equipment, storage medium and program product

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ETL集群优化技术研究与实现;李兰友;胡诚皓;张春华;;电脑知识与技术(13);全文 *

Also Published As

Publication number Publication date
CN113360558A (en) 2021-09-07

Similar Documents

Publication Publication Date Title
US11392416B2 (en) Automated reconfiguration of real time data stream processing
US10338958B1 (en) Stream adapter for batch-oriented processing frameworks
CN111752799A (en) Service link tracking method, device, equipment and storage medium
CN111064626B (en) Configuration updating method, device, server and readable storage medium
CN110532493B (en) Data processing method and device, storage medium and electronic device
CN110750592A (en) Data synchronization method, device and terminal equipment
CN111459760A (en) Micro-service monitoring method and device and computer storage medium
US11178197B2 (en) Idempotent processing of data streams
CN110457132B (en) Method and device for creating functional object and terminal equipment
US10951540B1 (en) Capture and execution of provider network tasks
CN108228432A (en) A kind of distributed link tracking, analysis method and server, global scheduler
CN113360558B (en) Data processing method, data processing device, electronic equipment and storage medium
US11567814B2 (en) Message stream processor microbatching
CN115134373A (en) Data synchronization method and device, storage medium and electronic equipment
US8224933B2 (en) Method and apparatus for case-based service composition
CN112598529B (en) Data processing method and device, computer readable storage medium and electronic equipment
CN113238849A (en) Timed task processing method and device, storage medium and electronic equipment
CN117349291A (en) Database primary key short ID generation method, electronic equipment and medium
CN113645260A (en) Service retry method, device, storage medium and electronic equipment
CN111506646A (en) Data synchronization method, device, system, storage medium and processor
US10819622B2 (en) Batch checkpointing for inter-stream messaging system
US10419368B1 (en) Dynamic scaling of computing message architecture
US20110023018A1 (en) Software platform and method of managing application individuals in the software platform
CN113377371B (en) Multi-scene configuration method, system, equipment and medium
CN112988889B (en) Method, device, equipment and storage medium for realizing block chain service

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant