CN111651510A - Data processing method and device, electronic equipment and computer readable storage medium - Google Patents

Data processing method and device, electronic equipment and computer readable storage medium Download PDF

Info

Publication number
CN111651510A
CN111651510A CN202010408417.7A CN202010408417A CN111651510A CN 111651510 A CN111651510 A CN 111651510A CN 202010408417 A CN202010408417 A CN 202010408417A CN 111651510 A CN111651510 A CN 111651510A
Authority
CN
China
Prior art keywords
data
real
merging
time
database
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010408417.7A
Other languages
Chinese (zh)
Inventor
周财斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Rajax Network Technology Co Ltd
Original Assignee
Rajax Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Rajax Network Technology Co Ltd filed Critical Rajax Network Technology Co Ltd
Priority to CN202010408417.7A priority Critical patent/CN111651510A/en
Publication of CN111651510A publication Critical patent/CN111651510A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP

Abstract

The embodiment of the invention discloses a data processing method, a device, electronic equipment and a computer readable storage medium, which are used for acquiring data tables with a preset format by performing real-time streaming data processing on acquired original data streams, acquiring at least one merging statistical table by performing real-time streaming merging processing on each data table, and storing the at least one merging statistical table into a preset database in a full-dimensional manner, wherein the database table in the preset database comprises real-time data and non-real-time data of each dimension generated by at least one data source end, and the embodiment adopts a real-time computing engine and a preset database to store the real-time data and the non-real-time data of each dimension of each related data source end into one database table, so that the development efficiency can be improved, the system is easier to maintain, and the data resource sharing of each data source end can be realized, the data demand can be responded to quickly.

Description

Data processing method and device, electronic equipment and computer readable storage medium
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a data processing method, an apparatus, an electronic device, and a computer-readable storage medium.
Background
At present, data warehouses are divided into an offline data warehouse and a real-time data warehouse, the offline data warehouse plays a great role in the traditional field, and the offline data warehouse often has the characteristics of relatively stable data and easiness in tracing, but the defects are also obvious, and most of the offline data warehouses are T-1, namely, the data of yesterday can be seen today. With the rise of big data, especially in the internet field, with the rise of 5G, IOT, the data volume is larger, and the tolerance of the user to time is lower, and things are always in vain, and feedback is needed in the minute level after the data is generated, even in the second level. In the traditional data product, the indexes can be checked at most only according to ten-minute level, the indexes can be realized only by complicated flow and configuration, and the real-time data warehouse can take effect immediately after data generation, thereby being greatly convenient for users to make decisions.
Disclosure of Invention
In view of this, embodiments of the present invention provide a data processing method, an apparatus, an electronic device, and a computer-readable storage medium, so as to improve development efficiency, make a system easier to maintain, and simultaneously implement data resource sharing at each data source end, and enable quick response to a data requirement.
In a first aspect, an embodiment of the present invention provides a data processing method, where the method includes:
acquiring an original data stream of at least one data source end;
processing the original data stream to obtain a data table with a preset format;
merging each data table to obtain at least one merging statistical table;
and storing the full dimension of the at least one merging statistical table into a preset database, wherein the database table in the preset database comprises real-time data and non-real-time data of each dimension generated by at least one data source end.
Optionally, the predetermined database is an analytic database AnalyticDB.
Optionally, the obtaining the original data stream of the at least one data source includes:
and acquiring the original data stream of at least one data source end through the distributed message middleware.
Optionally, the data processing the original data stream, and acquiring the data table with the predetermined format includes:
and performing real-time streaming data processing on the original data in the message queue by adopting a real-time computing engine Blink or Flink to acquire the corresponding data table.
Optionally, the method further includes:
and sending each data table to a message queue of the distributed message middleware.
Optionally, merging the data tables to obtain at least one merging statistical table includes:
and performing real-time streaming merging processing on each data table by adopting a real-time computing engine Blink or Flink to obtain at least one merging statistical table.
Optionally, performing real-time streaming merging processing on each data table by using a real-time computing engine Blink or Flink, and acquiring at least one merging statistical table includes:
merging the data with the same dimensionality in each data table by adopting a real-time computing engine Blink or Flink, and/or aggregating the data tables with the incidence relation to obtain at least one merging statistical table.
Optionally, the data processing the original data stream, and acquiring the data table with the predetermined format includes:
performing streaming normalization operation, data cleaning operation and/or zero filling operation on the original data to obtain normalized data;
and acquiring a data table with a preset format corresponding to each data source end according to the normalized data.
Optionally, the method further includes:
responding to a viewing instruction, and acquiring data corresponding to the viewing instruction from the preset database;
performing statistical analysis on the data corresponding to the viewing instruction to obtain a statistical analysis result;
and sending the statistical analysis result to display the statistical analysis result on a corresponding terminal interface.
In a second aspect, an embodiment of the present invention provides a data processing apparatus, where the apparatus includes:
the data acquisition unit is configured to acquire original data streams of at least one data source end;
the data processing unit is configured to perform data processing on the original data stream to obtain a data table with a predetermined format;
the data table merging unit is configured to merge the data tables to obtain at least one merging statistical table;
the storage unit is configured to store the at least one merging statistical table in a full-dimensional manner into a predetermined database, and a database table in the predetermined database comprises real-time data and non-real-time data of each dimension generated by at least one data source end.
Optionally, the predetermined database is an analytic database AnalyticDB.
Optionally, the data acquiring unit includes:
and the data acquisition subunit is configured to acquire the original data stream of the at least one data source end through the distributed message middleware.
Optionally, the data processing unit is further configured to perform real-time streaming data processing on the original data in the message queue by using a real-time computing engine Blink or Flink, and acquire the corresponding data table.
Optionally, the apparatus further comprises:
a data table sending unit configured to send each of the data tables to a message queue of the distributed message middleware.
Optionally, the data table merging unit is further configured to perform real-time streaming merging processing on each data table by using a real-time computing engine Blink or Flink, and obtain at least one merging statistical table.
Optionally, the data table merging unit includes:
the data table classifying subunit is configured to adopt a real-time computing engine Blink or Flink to classify the data with the same dimensionality in each data table and/or aggregate the data tables with the incidence relation, so as to obtain at least one classifying statistical table.
Optionally, the data processing unit includes:
the data processing subunit is configured to perform streaming normalization operation, data cleaning operation and/or zero padding operation on the original data to obtain normalized data;
and the data table generating subunit is configured to generate a data table with a predetermined format corresponding to each data source end according to the normalized data.
Optionally, the apparatus further comprises:
a data query unit configured to, in response to a viewing instruction, acquire data corresponding to the viewing instruction from the predetermined database;
the data statistical analysis unit is configured to perform statistical analysis on the data corresponding to the viewing instruction to obtain a statistical analysis result;
and the statistical result sending unit is configured to send the statistical analysis result so as to display the statistical analysis result on a corresponding terminal interface.
In a third aspect, an embodiment of the present invention provides an electronic device, including a memory and a processor, where the memory is used to store one or more computer program instructions, where the one or more computer program instructions are executed by the processor to implement the method described above.
In a fourth aspect, embodiments of the present invention provide a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement a method as described above.
The embodiment of the invention obtains the data tables with the preset format by carrying out real-time streaming data processing on the obtained original data stream, obtains at least one merging statistical table by carrying out real-time streaming merging processing on each data table, and stores the at least one merging statistical table into the preset database in a full-dimensional manner, wherein the database table in the preset database comprises real-time data and non-real-time data of each dimension generated by at least one data source terminal.
Drawings
The above and other objects, features and advantages of the present invention will become more apparent from the following description of the embodiments of the present invention with reference to the accompanying drawings, in which:
FIG. 1 is a flow chart of a data processing method of an embodiment of the present invention;
FIGS. 2-4 are schematic diagrams of interfaces for displaying data statistics according to embodiments of the present invention;
FIG. 5 is a schematic diagram of a data processing process of an embodiment of the present invention;
FIG. 6 is a schematic diagram of a data processing apparatus according to an embodiment of the present invention;
fig. 7 is a schematic diagram of an electronic device of an embodiment of the invention.
Detailed Description
The present invention will be described below based on examples, but the present invention is not limited to only these examples. In the following detailed description of the present invention, certain specific details are set forth. It will be apparent to one skilled in the art that the present invention may be practiced without these specific details. Well-known methods, procedures, components and circuits have not been described in detail so as not to obscure the present invention.
Further, those of ordinary skill in the art will appreciate that the drawings provided herein are for illustrative purposes and are not necessarily drawn to scale.
Unless the context clearly requires otherwise, throughout the description, the words "comprise", "comprising", and the like are to be construed in an inclusive sense as opposed to an exclusive or exhaustive sense; that is, what is meant is "including, but not limited to".
In the description of the present invention, it is to be understood that the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. In addition, in the description of the present invention, "a plurality" means two or more unless otherwise specified.
Some of the terms associated with the embodiments of the present invention are explained below:
streaming computation/streaming processing: the method is a continuous data processing method, the data are calculated after being generated, and the method has the characteristic of low delay;
Storm/Spark Streaming/Flink/Blink: a streaming computing engine;
AnalyticDB, an analytical database supporting multidimensional real-time data analysis;
data Warehouse (Data Warehouse): DW for short, is a theme-oriented, integrated, relatively stable, historical change-reflecting data set used to support management decisions;
a real-time data warehouse: a real-time updated data warehouse.
In some related technologies, data is not refreshed in real time, but a batch of data new on the same day is produced by a timing task every preset time, meanwhile, an offline data warehouse is matched to check yesterday and previous historical data, and corresponding index information and the like are obtained according to the current new data and the data in the offline data warehouse. However, in this method, two data acquisition paths exist, two sets of service codes are required for the same service, a technical chain is too long, which affects a development cycle, and data obtained by the two paths may not be completely consistent.
In other related technologies, the obtained raw data is processed by using a stream computing engine Storm, Spark Streaming or Flink, and a corresponding data storage scheme is selected for storage, for example, a Mysql database is used for storage. In the related art, targeted coding needs to be performed according to each service scene, so that the problems of high difficulty, difficulty in debugging, difficulty in coping with change of demand and the like are solved, the experience of technical personnel is depended on, a complete data chain is not formed, and the development period is long.
Therefore, the embodiment of the invention provides a data processing method, which adopts a real-time computing engine and an analytic database AnalyticDB to enable real-time data and non-real-time data of each dimension of each related data source end to be stored in one database table, thereby improving development efficiency, enabling a system to be maintained more easily, realizing data resource sharing of each data source end and responding to data requirements quickly.
Fig. 1 is a flowchart of a data processing method of an embodiment of the present invention. As shown in fig. 1, the data processing method according to the embodiment of the present invention includes the following steps:
step S110, obtaining an original data stream of at least one data source. Optionally, each data source end is a data source end related to the same application scenario. For example, in the field of logistics, the data source end may include a distributor terminal, a merchant terminal, a server end of each site, and the like.
In an alternative implementation, the raw data stream of at least one data source end is obtained through distributed message middleware. Specifically, after the data source end generates the raw data, the raw data is sent to a message queue of the distributed message middleware. Any distributed system, such as kafka, etc., may be used in this embodiment, which is not shown.
Step S120, performing data processing on the original data stream to obtain a data table with a predetermined format.
In an optional implementation manner, performing data processing on the original data stream includes performing stream normalization operation, data cleaning operation, and/or zero padding operation on the original data stream to obtain normalized data, and generating a data table with a predetermined format corresponding to each data source end according to the normalized data at each data source end.
In an alternative implementation, the raw data is streamed in real time, and a data table with a predetermined format is obtained. Optionally, a real-time computing engine Blink or Flink is used to perform real-time streaming data processing on the original data in the message queue, so as to obtain the corresponding data table. Optionally, a real-time computing engine Blink or Flink is used to perform normalization operation, data cleaning operation, and/or zero padding operation on the original data according to the message topics in the message queue to obtain normalized data, and a data table related to the corresponding topic domain is formed according to the normalized data of each data source end. Optionally, the data table is a detail width table, that is, a width table including data details.
The real-time computation engine Flink is a distributed processing engine for streaming data and batch data, and can process bounded batch data sets and unbounded real-time data sets. The real-time computing engine Blink is a streaming computing distributed processing engine constructed based on Flink, and has high real-time computing efficiency and data compatibility.
It is easy to understand that in a specific application scenario, different data sources of different users and the same type of data source may have different schema settings, which may result in different data formats generated by different data sources of the same type, and some data sources may also generate some error data, for example, a certain field is missing in a line of data. Therefore, the raw data generated by each data source end needs to be subjected to data normalization, data cleaning and filtering, and/or zero padding operation, etc. to obtain normalized data, so as to facilitate subsequent data analysis. Taking an application scenario in the logistics field as an example, the attributes of the logistics distributors may include team distributors and crowd-sourced distributors, wherein the team distributors need to punch a card from work to work, and the crowd-sourced distributors do not need to punch a card from work to work, so that although the terminal devices of the team distributors and the crowd-sourced distributors all belong to the distributor terminal devices, the generated corresponding data are different due to different attribute characteristics. Therefore, when the original data generated by the terminal equipment of the crowdsourcing distributor is processed, the attribute of punching the card needs to be added, and the card is not punched. Therefore, the embodiment can normalize the data when acquiring the original data, the data is basically real-time, almost no delay exists in service, and the real-time computing engine Blink or Flink is clustered, so that a large amount of real-time data can be processed, and the data delay is further reduced.
Taking an application scenario in the logistics field as an example, a data table corresponding to a data source end corresponding to a distributor formed through data processing is shown in table (1):
watch (1)
Figure BDA0002492193510000071
Figure BDA0002492193510000081
The card punching attribute is '1' to represent card punching, and the card punching attribute is '0' to represent no card punching.
As shown in table (1), in this embodiment, a real-time computing engine Blink or Flink is used to perform data processing such as real-time data cleaning on original data in a message queue to obtain normalized data, and a data table corresponding to a data source end corresponding to a distributor is formed according to a message topic.
The present embodiment is illustrated by taking the logistics field as an example, and it should be understood that the present embodiment can be applied to any data storage application fields, such as a network appointment car application field, a network shopping application field, and the like.
In an optional implementation manner, the data processing method of this embodiment further includes:
and sending each data table to a message queue of the distributed message middleware to realize the output of each data table. The embodiment adopts a message queue mode to realize real-time and high-performance transmission of large data volume, so that a consumption end (namely a terminal needing to view or use data) can obtain real-time data.
And step S130, merging the data tables to obtain at least one merging statistical table.
In an optional implementation manner, the data tables are subjected to real-time streaming merging processing, and at least one merging statistical table is obtained. Optionally, a real-time calculation engine Blink or Flink is used to perform real-time streaming merging processing on each data table, so as to obtain at least one merging statistical table. Optionally, a real-time computing engine Blink or Flink is used to merge data with the same dimensionality in each data table, and/or aggregate data tables with an association relationship, so as to obtain at least one merging statistical table.
In this embodiment, a real-time computing engine Blink or Flink may be used to merge data tables with the same dimensionality, and then merge the corresponding indexes, or aggregate data tables with an association relationship, so that a plurality of service-oriented merging statistical tables may be generated. Taking the logistics field as an example, a distributor business domain merging statistical table facing the distributor business, or a waybill distribution domain merging statistical table facing the waybill and the distribution business, etc. can be generated. And the real-time computing engine Blink is based on SQL (Structured Query Language), supports a powerful table-linking function, and reduces the complexity of merging the data table into the merging statistical table.
Step S140, storing the full dimension of at least one merging statistical table into a predetermined database, where a database table in the predetermined database includes real-time data and non-real-time data of each dimension generated by at least one data source. The predetermined database supports multidimensional data analysis. In an alternative implementation, the predetermined database may be an analytical database AnalyticDB.
The analytic database AnalyticDB is a database supporting real-time high-concurrency online analysis of mass data, and can perform real-time multi-dimensional analysis perspective and data exploration on a large amount of data at millisecond level. In the AnalyticDB, the resources are distributed and managed according to the databases, and each database shares one service process, so that the resources among users are isolated. The analytical DB database is composed of a plurality of ECDs (Elastic computing units), and each ECD is provided with fixed disk and memory resources. In the AnalyticDB, a table set is a collection of a series of data tables that can be associated, including a dimension table set and a normal table set. The dimension table group is a dimension table with dimension concept (such as an object flow table of a logistics platform, etc.), and the common table group comprises common tables needing to be associated. The dimension table is a table with the concept of dimension, and each ECU node can place a full amount of dimension table data which can be associated with any common table. The common table is a partition table designed to take advantage of the query capabilities of the distributed system.
In this embodiment, at least one merged statistical table is stored in full dimension in the analytical database AnalyticDB. Taking the logistics field as an example, the commonly used dimensions may include a line of transportation, a platform type, a type of a standard, whether to order in advance, a team ID, a business circle ID, and the like, and in this embodiment, each dimension data is stored into one database table through full-dimension storage, where the database table includes each dimension data stored in real time and each dimension data stored in non-real time, and thus, data corresponding to a service may be acquired from the database table according to the specific service to perform data analysis. Compared with the method for constructing a specific database table for specific business in some related technologies (for example, a team needs to be analyzed, a team table is constructed, a standard product is analyzed, a team and a standard product are analyzed, and a team standard product table is constructed), the method can acquire data corresponding to the related business from a full-dimensional database table for analysis, effectively meets the data requirement of each business analysis, and improves the data acquisition and analysis efficiency.
In the embodiment, real-time data (such as data of the same day) and non-real-time data of each dimension generated in one application scene are both stored in one database table, so that in the embodiment, the consumption end only needs one data acquisition path, the same service adopts one set of service codes, the development period is short, and the problems of incomplete data consistency and data loss are effectively avoided. Meanwhile, targeted coding is not needed for each service scene, and modification is simple to meet the change of the demand.
In an optional implementation manner, the data processing method of this embodiment further includes:
responding to the viewing instruction, acquiring data corresponding to the viewing instruction from a preset database, performing statistical analysis on the data corresponding to the viewing instruction, acquiring a statistical analysis result, and sending the statistical analysis result to display the statistical analysis result on a corresponding terminal interface.
Taking the logistics field as an example, assuming that the viewing instruction is to view snapshot data of the current day, where the snapshot data of the current day is from 0 point of the current day to the current accumulated value, assuming that the statistical conditions are as shown in table (2):
watch (2)
Point in time Team ID Type of platform Cumulative exercise amount/pen Other index X
01:00 Team A P 8 X1
01:00 Team B P 5 X2
09:00 Team A P 10 X3
10:00 Team A P 16 X4
Suppose that day 09: at 00, the total manifest volume for team a and team B, as shown in table (2), on day 09: 00, the waybill volume of team a is 10, the waybill volume of team B is 5, and the total waybill volume of team a and team B is 15. It is easy to understand that when the total freight volume of each team at a time point is obtained, data generated by each team at a given time point or the last time point smaller than the given time point needs to be obtained through grouping and sequencing, and the analytical database analytical db can well meet the analytical requirements, so that the analytical database analytical db can meet the data requirements and the analytical requirements of each business analysis, and the data obtaining and analyzing efficiency is improved.
Fig. 2-4 are schematic diagrams of interfaces for displaying data statistics according to the embodiment of the present invention. The present embodiment is described by taking a predetermined database as an analytic database, AnalyticDB, as an example. As shown in fig. 2, the terminal interface 2 is configured to display the selected viewing content, generate a corresponding viewing instruction after the "ok" plug-in of the terminal interface 2 is triggered, obtain data corresponding to the viewing instruction from the analytic database AnalyticDB according to the viewing instruction, perform statistical analysis on the data corresponding to the viewing instruction, and obtain a statistical analysis result. In this embodiment, taking the key indicators of the transportation capacity of a transportation line team of the logistics platform as an example, the analytical database AnalyticDB obtains the current order data, the previous order data and the previous order data seven days from the corresponding full-dimensional database table, performs statistical analysis on these data, and sends the statistical analysis result to the terminal interface generating the viewing instruction. As shown in fig. 3, the terminal interface 3 displays key indicators of the capacity wire team such as the push amount, the hang amount, the delivery in-delivery waybill, the exception cancellation order, the pick-up rate, the completion rate of the day, the timeout rate (which may include timeout rates corresponding to different timeout times), and the rising point or the falling point of the time corresponding to one day or seven days ago. In an alternative display manner, the index trend of the current day, the index trend of the day ahead, and the index trend of the seven days ahead may also be displayed, as shown in fig. 4, taking the index trend of the key index list pushing amount as an example, the interface terminal 4 displays the list pushing amount index trend from the morning of the current day to the current time (for example, 20: 00) from the morning of the current day to the corresponding current time (for example, 20: 00) from the morning of the day ahead, and the list pushing amount index trend from the morning of the seven days to the corresponding current time (for example, 20: 00) from the seven days ahead. It should be understood that the interface display diagrams of the present embodiments are merely exemplary, and do not characterize real interface display diagrams.
The embodiment of the invention obtains the data tables with the preset format by carrying out real-time streaming data processing on the obtained original data stream, obtains at least one merging statistical table by carrying out real-time streaming merging processing on each data table, and stores the at least one merging statistical table into the analytic database AnalyticDB in a full-dimensional manner, wherein the database table in the AnalyticDB comprises real-time data and non-real-time data of each dimension generated by at least one data source terminal.
Fig. 5 is a schematic diagram of a data processing procedure according to an embodiment of the present invention. In this embodiment, the logistics field is taken as an example for explanation, as shown in fig. 5, data source ends such as a distributor data source end, a waybill data source end, a shunt data source end, and a pressure balance data source end send original data generated in real time to an original data message queue, and this embodiment implements streaming processing of data through a message queue of a lower message middleware. The real-time computing engine Blink acquires original data from the message queue in real time, performs real-time data cleaning, data normalization and/or zero padding on the original data to acquire normalized data, and forms a data table with a predetermined format related to a corresponding topic domain according to the normalized data based on a message topic of the original data. The real-time computing engine Blink performs streaming data processing on raw data from a distributor data source end to form a distributor data table, performs streaming data processing on raw data from a waybill data source end to form a waybill data table, performs streaming data processing on raw data from a shunting data source end to form a shunting data table, and performs streaming data processing on raw data from a pressure balance data source end to form a pressure balance data table. And sending each data table to a data table message queue of the distributed message middleware to realize the output of each data table. And merging the data with the same dimensionality in each data table and/or aggregating the data tables with the incidence relation by using a downstream real-time computing engine Blink to obtain at least one merging statistical table. For example, the downstream real-time computing engine Blink performs aggregation statistics on the waybill data table and the split-flow data table to form a split-flow waybill merging statistical table, merges and performs statistical processing on the dimensional data corresponding to each distributor data source end to obtain a distributor merging statistical table, merges and performs statistical processing on the dimensional data corresponding to each pressure balance data source end to obtain a pressure balance merging statistical table, and the like. And storing all dimensions of the merged statistical tables into an analytic database AnalyticDB, wherein the database table in the AnalyticDB comprises real-time data and non-real-time data of all dimensions generated by at least one data source terminal.
In this embodiment, a message queue manner is adopted to implement real-time and high-performance transmission of a large amount of data, so that a consumer (that is, a terminal that needs to view or use data) can obtain real-time data, a real-time computing engine Blink raw data is adopted to perform streaming data processing to generate a corresponding data table, and the data table is subjected to streaming processing to obtain a corresponding merging statistical table, which can reduce complexity of merging of the data table, and each dimension data is stored into one database table through full-dimension storage, wherein the database table includes each dimension data stored in real time and each dimension data stored in non-real time, and thus, data corresponding to a service can be obtained from the database table according to the specific service to perform data analysis. Compared with some related technologies in which a specific database table is constructed for a specific service, the embodiment can acquire data corresponding to the related service from the full-dimensional database table for analysis, thereby satisfying the data requirements of each service analysis and improving the data acquisition and analysis efficiency. Meanwhile, in the embodiment, real-time data (such as data of the same day) and non-real-time data of each dimension generated in one application scene are stored in one database table, so that in the embodiment, the consumption end only needs one data acquisition path, the same service adopts one set of service codes, the development period is short, and the problems of incomplete data consistency and data loss are effectively avoided. Meanwhile, targeted coding is not needed for each service scene, and modification is simple to meet the change of the demand.
When the terminal needs to check the relevant data statistics, the analytic database AnalyticDB acquires the corresponding data, performs multi-dimensional statistical analysis on the data corresponding to the checking instruction, acquires a statistical analysis result, and sends the statistical analysis result so as to display the multi-dimensional statistical analysis result on a user terminal interface (as shown in FIG. 3 and FIG. 4).
The embodiment of the invention obtains the data tables with the preset format by carrying out real-time streaming data processing on the obtained original data stream, obtains at least one merging statistical table by carrying out real-time streaming merging processing on each data table, and stores the at least one merging statistical table into the analytic database AnalyticDB in a full-dimensional manner, wherein the database table in the AnalyticDB comprises real-time data and non-real-time data of each dimension generated by at least one data source terminal.
Fig. 6 is a schematic diagram of a data processing apparatus according to an embodiment of the present invention. As shown in fig. 6, the data processing apparatus of the present embodiment includes a data acquisition unit 61, a data processing unit 62, a data table merging unit 63, and a storage unit 64.
The data retrieving unit 61 is configured to retrieve a raw data stream of at least one data source. In an alternative implementation, the data acquisition unit 61 includes a data acquisition sub-unit 611. The data retrieving sub-unit 611 is configured to retrieve a raw data stream of at least one data source via distributed message middleware.
The data processing unit 62 is configured to perform data processing on the raw data stream, and obtain a data table having a predetermined format. In an optional implementation manner, the data processing unit 62 is further configured to perform real-time streaming data processing on the original data in the message queue by using a real-time computing engine Blink or Flink, and obtain the corresponding data table. In an alternative implementation, the data processing unit 62 includes a data processing sub-unit 621 and a data table generating sub-unit 622. The data processing subunit 621 is configured to perform a streaming normalization operation, a data cleansing operation, and/or a zero padding operation on the raw data to obtain normalized data. The data table generating sub-unit 622 is configured to generate a data table with a predetermined format corresponding to each data source according to the normalized data.
The data table merging unit 63 is configured to merge the data tables to obtain at least one merged statistical table. In an alternative implementation manner, the data table merging unit 63 is further configured to perform real-time streaming merging processing on each data table by using a real-time computing engine Blink or Flink, so as to obtain at least one merging statistical table. In an alternative implementation, the data table merging unit 63 includes a data table merging subunit 631. The data table classifying subunit 631 is configured to adopt the real-time computing engine Blink or Flink to merge data of the same dimension in each of the data tables, and/or aggregate data tables having an association relationship, so as to obtain at least one merged statistical table.
The storage unit 64 is configured to store the at least one merged statistical table in full dimension into a predetermined database, where database tables in the predetermined database include real-time data and non-real-time data of each dimension generated by at least one of the data sources. Optionally, the predetermined database is an analytic database AnalyticDB.
In an alternative implementation manner, the data processing apparatus 6 of the present embodiment further includes a data table sending unit 65. The data table sending unit 65 is configured to send each of the data tables to the message queue of the distributed message middleware.
In an alternative implementation manner, the data processing apparatus 6 of the present embodiment further includes a data query unit 66, a data statistics analysis unit 67, and a statistical result sending unit 68. The data query unit 66 is configured to retrieve data corresponding to a viewing instruction from the predetermined database in response to the viewing instruction. The data statistical analysis unit 67 is configured to perform statistical analysis on the data corresponding to the viewing instruction, and obtain a statistical analysis result. The statistical result sending unit 68 is configured to send the statistical analysis result so as to display the statistical analysis result on the corresponding terminal interface.
The embodiment of the invention obtains the data tables with the preset format by carrying out real-time streaming data processing on the obtained original data stream, obtains at least one merging statistical table by carrying out real-time streaming merging processing on each data table, and stores the at least one merging statistical table into the preset database in a full-dimensional manner, wherein the database table in the preset database comprises real-time data and non-real-time data of each dimension generated by at least one data source terminal.
Fig. 7 is a schematic diagram of an electronic device of an embodiment of the invention. In the present embodiment, the electronic device 7 includes a server, a terminal, and the like. As shown in fig. 7, the electronic device 7: at least one processor 71; and a memory 72 communicatively coupled to the at least one processor 71; and a communication component 73 communicatively coupled to the scanning device, the communication component 73 receiving and transmitting data under control of the processor 71; the memory 72 stores instructions executable by the at least one processor 71, and the instructions are executed by the at least one processor 71 to implement the data processing method.
Specifically, the electronic device includes: one or more processors 71 and a memory 72, one processor 71 being exemplified in fig. 7. The processor 71 and the memory 72 may be connected by a bus or other means, and fig. 7 illustrates the connection by a bus as an example. Memory 72, which is a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules. The processor 71 executes various functional applications of the device and data processing, i.e., implements the above-described data processing method, by executing nonvolatile software programs, instructions, and modules stored in the memory 72.
The memory 72 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store a list of options, etc. Further, the memory 72 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, the memory 72 may optionally include memory located remotely from the processor 71, which may be connected to an external device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
One or more modules are stored in the memory 72 and, when executed by the one or more processors 71, perform the data processing method in any of the method embodiments described above.
The product can execute the method provided by the embodiment of the application, has corresponding functional modules and beneficial effects of the execution method, and can refer to the method provided by the embodiment of the application without detailed technical details in the embodiment.
The embodiment of the invention obtains the data tables with the preset format by carrying out real-time streaming data processing on the obtained original data stream, obtains at least one merging statistical table by carrying out real-time streaming merging processing on each data table, and stores the at least one merging statistical table into the preset database in a full-dimensional manner, wherein the database table in the preset database comprises real-time data and non-real-time data of each dimension generated by at least one data source terminal.
Another embodiment of the invention is directed to a non-transitory storage medium storing a computer-readable program for causing a computer to perform some or all of the above-described method embodiments.
That is, as can be understood by those skilled in the art, all or part of the steps in the method for implementing the embodiments described above may be implemented by a program instructing related hardware, where the program is stored in a storage medium and includes several instructions to enable a device (which may be a single chip, a chip, or the like) or a processor (processor) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The embodiment of the invention discloses A1 and a data processing method, wherein the method comprises the following steps:
acquiring an original data stream of at least one data source end;
processing the original data stream to obtain a data table with a preset format;
merging each data table to obtain at least one merging statistical table;
and storing the full dimension of the at least one merging statistical table into a preset database, wherein the database table in the preset database comprises real-time data and non-real-time data of each dimension generated by at least one data source end.
A2, the method according to A1, wherein the predetermined database is an analytic database AnalyticDB.
A3, the method according to a1, wherein obtaining the raw data stream of at least one data source end includes:
and acquiring the original data stream of at least one data source end through the distributed message middleware.
A4, the method according to A3, wherein the data processing the original data stream, and the obtaining of the data table with the predetermined format comprises:
and performing real-time streaming data processing on the original data in the message queue by adopting a real-time computing engine Blink or Flink to acquire the corresponding data table.
A5, the method according to A4, wherein the method further comprises:
and sending each data table to a message queue of the distributed message middleware.
A6, the method according to A1, wherein the merging each data sheet to obtain at least one merging statistical sheet comprises:
and performing real-time streaming merging processing on each data table by adopting a real-time computing engine Blink or Flink to obtain at least one merging statistical table.
A7, the method according to A6, wherein the real-time streaming merging processing is performed on each data table by using a real-time computing engine Blink or Flink, and the obtaining of at least one merging statistical table comprises:
merging the data with the same dimensionality in each data table by adopting a real-time computing engine Blink or Flink, and/or aggregating the data tables with the incidence relation to obtain at least one merging statistical table.
A8, the method according to A1, wherein the data processing the original data stream, and the obtaining of the data table with the predetermined format comprises:
performing streaming normalization operation, data cleaning operation and/or zero filling operation on the original data to obtain normalized data;
and acquiring a data table with a preset format corresponding to each data source end according to the normalized data.
A9, the method according to A1, wherein the method further comprises:
responding to a viewing instruction, and acquiring data corresponding to the viewing instruction from the preset database;
performing statistical analysis on the data corresponding to the viewing instruction to obtain a statistical analysis result;
and sending the statistical analysis result to display the statistical analysis result on a corresponding terminal interface.
The embodiment of the invention also discloses B1 and a data processing device, wherein the device comprises:
the data acquisition unit is configured to acquire original data streams of at least one data source end;
the data processing unit is configured to perform data processing on the original data stream to obtain a data table with a predetermined format;
the data table merging unit is configured to merge the data tables to obtain at least one merging statistical table;
the storage unit is configured to store the at least one merging statistical table in a full-dimensional manner into a predetermined database, and a database table in the predetermined database comprises real-time data and non-real-time data of each dimension generated by at least one data source end.
B2, the device according to B1, wherein the predetermined database is an analytic database AnalyticDB.
B3, the device according to B1, wherein the data acquisition unit includes:
and the data acquisition subunit is configured to acquire the original data stream of the at least one data source end through the distributed message middleware.
The device according to B3, B4, wherein the data processing unit is further configured to perform real-time streaming data processing on the original data in the message queue by using a real-time computation engine Blink or Flink, and obtain the corresponding data table.
B5, the apparatus according to B4, wherein the apparatus further comprises:
a data table sending unit configured to send each of the data tables to a message queue of the distributed message middleware.
The device according to B1 and B6, wherein the data table merging unit is further configured to perform real-time streaming merging processing on each data table by using a real-time computing engine Blink or Flink to obtain at least one merging statistical table.
B7, the device according to B6, wherein the data table merging unit includes:
the data table classifying subunit is configured to adopt a real-time computing engine Blink or Flink to classify the data with the same dimensionality in each data table and/or aggregate the data tables with the incidence relation, so as to obtain at least one classifying statistical table.
B8, the device according to B1, wherein the data processing unit includes:
the data processing subunit is configured to perform streaming normalization operation, data cleaning operation and/or zero padding operation on the original data to obtain normalized data;
and the data table generating subunit is configured to generate a data table with a predetermined format corresponding to each data source end according to the normalized data.
B9, the apparatus according to B1, wherein the apparatus further comprises:
a data query unit configured to, in response to a viewing instruction, acquire data corresponding to the viewing instruction from the predetermined database;
the data statistical analysis unit is configured to perform statistical analysis on the data corresponding to the viewing instruction to obtain a statistical analysis result;
and the statistical result sending unit is configured to send the statistical analysis result so as to display the statistical analysis result on a corresponding terminal interface.
The embodiment of the invention also discloses C1, an electronic device, comprising a memory and a processor, wherein the memory is used for storing one or more computer program instructions, and the processor executes the one or more computer program instructions to realize the method according to any one of A1-A9.
The embodiment of the invention also discloses D1, a computer readable storage medium, wherein the computer program instructions are stored on the computer readable storage medium, and when the computer program instructions are executed by a processor, the computer program instructions realize the method according to any one of A1-A9.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. A method of data processing, the method comprising:
acquiring an original data stream of at least one data source end;
processing the original data stream to obtain a data table with a preset format;
merging each data table to obtain at least one merging statistical table;
and storing the full dimension of the at least one merging statistical table into a preset database, wherein the database table in the preset database comprises real-time data and non-real-time data of each dimension generated by at least one data source end.
2. The method of claim 1, wherein the predetermined database is an analytical database AnalyticDB.
3. The method of claim 1, wherein obtaining the raw data stream of at least one data source comprises:
and acquiring the original data stream of at least one data source end through the distributed message middleware.
4. The method of claim 3, wherein performing data processing on the raw data stream to obtain a data table having a predetermined format comprises:
and performing real-time streaming data processing on the original data in the message queue by adopting a real-time computing engine Blink or Flink to acquire the corresponding data table.
5. The method of claim 4, further comprising:
and sending each data table to a message queue of the distributed message middleware.
6. The method of claim 1, wherein merging each of the data tables to obtain at least one merged statistical table comprises:
and performing real-time streaming merging processing on each data table by adopting a real-time computing engine Blink or Flink to obtain at least one merging statistical table.
7. The method according to claim 6, wherein the real-time streaming merging processing is performed on each data table by using a real-time computing engine Blink or Flink, and obtaining at least one merging statistics table comprises:
merging the data with the same dimensionality in each data table by adopting a real-time computing engine Blink or Flink, and/or aggregating the data tables with the incidence relation to obtain at least one merging statistical table.
8. A data processing apparatus, characterized in that the apparatus comprises:
the data acquisition unit is configured to acquire original data streams of at least one data source end;
the data processing unit is configured to perform data processing on the original data stream to obtain a data table with a predetermined format;
the data table merging unit is configured to merge the data tables to obtain at least one merging statistical table;
the storage unit is configured to store the at least one merging statistical table in a full-dimensional manner into a predetermined database, and a database table in the predetermined database comprises real-time data and non-real-time data of each dimension generated by at least one data source end.
9. An electronic device comprising a memory and a processor, wherein the memory is configured to store one or more computer program instructions, wherein the one or more computer program instructions are executed by the processor to implement the method of any of claims 1-7.
10. A computer-readable storage medium on which computer program instructions are stored, which computer program instructions, when executed by a processor, are to implement a method according to any one of claims 1-7.
CN202010408417.7A 2020-05-14 2020-05-14 Data processing method and device, electronic equipment and computer readable storage medium Pending CN111651510A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010408417.7A CN111651510A (en) 2020-05-14 2020-05-14 Data processing method and device, electronic equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010408417.7A CN111651510A (en) 2020-05-14 2020-05-14 Data processing method and device, electronic equipment and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN111651510A true CN111651510A (en) 2020-09-11

Family

ID=72342847

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010408417.7A Pending CN111651510A (en) 2020-05-14 2020-05-14 Data processing method and device, electronic equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN111651510A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112418941A (en) * 2020-11-26 2021-02-26 欧冶云商股份有限公司 Resource popularity calculation method, system and storage medium based on real-time flow
CN113220756A (en) * 2021-03-25 2021-08-06 上海东普信息科技有限公司 Logistics data real-time processing method, device, equipment and storage medium
CN113268514A (en) * 2021-05-26 2021-08-17 深圳壹账通智能科技有限公司 Multidimensional data statistical method and device, electronic equipment and storage medium
CN113342853A (en) * 2021-06-18 2021-09-03 上海哔哩哔哩科技有限公司 Streaming data processing method and system
CN113806416A (en) * 2021-03-12 2021-12-17 京东科技控股股份有限公司 Method and device for realizing real-time data service and electronic equipment
CN114363435A (en) * 2021-12-31 2022-04-15 广东柯内特环境科技有限公司 Environmental data monitoring and processing method
WO2023109302A1 (en) * 2021-12-15 2023-06-22 中兴通讯股份有限公司 Data processing method and device, and storage medium
CN117171811A (en) * 2023-09-12 2023-12-05 浪潮数字(山东)建设运营有限公司 Database synchronization and tamper-resistant tracing method and device and electronic equipment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109684352A (en) * 2018-12-29 2019-04-26 江苏满运软件科技有限公司 Data analysis system, method, storage medium and electronic equipment
CN109918441A (en) * 2019-04-03 2019-06-21 颜沿(上海)智能科技有限公司 A kind of end message processing methods of exhibiting and system
CN110147398A (en) * 2019-04-25 2019-08-20 北京字节跳动网络技术有限公司 A kind of data processing method, device, medium and electronic equipment
CN110175184A (en) * 2019-04-30 2019-08-27 阿里巴巴集团控股有限公司 A kind of lower drill method, system and the electronic equipment of data dimension
CN110795478A (en) * 2019-09-29 2020-02-14 北京淇瑀信息科技有限公司 Data warehouse updating method and device applied to financial business and electronic equipment
CN111125266A (en) * 2019-12-24 2020-05-08 中国建设银行股份有限公司 Data processing method, device, equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109684352A (en) * 2018-12-29 2019-04-26 江苏满运软件科技有限公司 Data analysis system, method, storage medium and electronic equipment
CN109918441A (en) * 2019-04-03 2019-06-21 颜沿(上海)智能科技有限公司 A kind of end message processing methods of exhibiting and system
CN110147398A (en) * 2019-04-25 2019-08-20 北京字节跳动网络技术有限公司 A kind of data processing method, device, medium and electronic equipment
CN110175184A (en) * 2019-04-30 2019-08-27 阿里巴巴集团控股有限公司 A kind of lower drill method, system and the electronic equipment of data dimension
CN110795478A (en) * 2019-09-29 2020-02-14 北京淇瑀信息科技有限公司 Data warehouse updating method and device applied to financial business and electronic equipment
CN111125266A (en) * 2019-12-24 2020-05-08 中国建设银行股份有限公司 Data processing method, device, equipment and storage medium

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112418941A (en) * 2020-11-26 2021-02-26 欧冶云商股份有限公司 Resource popularity calculation method, system and storage medium based on real-time flow
CN113806416A (en) * 2021-03-12 2021-12-17 京东科技控股股份有限公司 Method and device for realizing real-time data service and electronic equipment
CN113806416B (en) * 2021-03-12 2023-11-03 京东科技控股股份有限公司 Method and device for realizing real-time data service and electronic equipment
CN113220756A (en) * 2021-03-25 2021-08-06 上海东普信息科技有限公司 Logistics data real-time processing method, device, equipment and storage medium
CN113268514A (en) * 2021-05-26 2021-08-17 深圳壹账通智能科技有限公司 Multidimensional data statistical method and device, electronic equipment and storage medium
CN113342853A (en) * 2021-06-18 2021-09-03 上海哔哩哔哩科技有限公司 Streaming data processing method and system
CN113342853B (en) * 2021-06-18 2023-03-21 上海哔哩哔哩科技有限公司 Streaming data processing method and system
WO2023109302A1 (en) * 2021-12-15 2023-06-22 中兴通讯股份有限公司 Data processing method and device, and storage medium
CN114363435A (en) * 2021-12-31 2022-04-15 广东柯内特环境科技有限公司 Environmental data monitoring and processing method
CN114363435B (en) * 2021-12-31 2023-12-12 广东柯内特环境科技有限公司 Environment data monitoring and processing method
CN117171811A (en) * 2023-09-12 2023-12-05 浪潮数字(山东)建设运营有限公司 Database synchronization and tamper-resistant tracing method and device and electronic equipment
CN117171811B (en) * 2023-09-12 2024-04-05 浪潮数字(山东)建设运营有限公司 Database synchronization and tamper-resistant tracing method and device and electronic equipment

Similar Documents

Publication Publication Date Title
CN111651510A (en) Data processing method and device, electronic equipment and computer readable storage medium
CN110784419B (en) Method and system for visualizing professional railway electric service data
US8401990B2 (en) System and method for aggregating raw data into a star schema
CN110647512B (en) Data storage and analysis method, device, equipment and readable medium
CN107515878B (en) Data index management method and device
EP1916824A2 (en) Real time web usage reporter using ram
CN111459986B (en) Data computing system and method
CN107273521B (en) Feed content quality evaluation method and device
CN108182139B (en) Early warning method, device and system
CN112506743A (en) Log monitoring method and device and server
CN110675194A (en) Funnel analysis method, device, equipment and readable medium
CN113420043A (en) Data real-time monitoring method, device, equipment and storage medium
CN109492056A (en) A kind of method and system of business intelligence data inquiry
CN107147527A (en) A kind of system and method for Linux clusters alarm
CN107346270B (en) Method and system for real-time computation based radix estimation
CN115392799A (en) Attribution analysis method and device, computer equipment and storage medium
CN112052134A (en) Service data monitoring method and device
CN113422808B (en) Internet of things platform HTTP information pushing method, system, device and medium
CN106557483B (en) Data processing method, data query method, data processing equipment and data query equipment
CN108460149B (en) Text data processing method, device and equipment and computer readable storage medium
CN110347653A (en) Data processing method and device, electronic equipment and readable storage medium storing program for executing
CN112711614A (en) Service data management method and device
CN116955817A (en) Content recommendation method, device, electronic equipment and storage medium
CN106533819B (en) Error monitoring method, device and system for online service
CN109034894A (en) Advertisement page pageview statistical method, device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination