CN112860710A - Data processing method, device and system and data query method and system - Google Patents

Data processing method, device and system and data query method and system Download PDF

Info

Publication number
CN112860710A
CN112860710A CN202110290417.6A CN202110290417A CN112860710A CN 112860710 A CN112860710 A CN 112860710A CN 202110290417 A CN202110290417 A CN 202110290417A CN 112860710 A CN112860710 A CN 112860710A
Authority
CN
China
Prior art keywords
data
service
message
query
message queue
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110290417.6A
Other languages
Chinese (zh)
Inventor
徐雪芳
谢梁洋
高国勇
杨伟丽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Yunling Technology Co ltd
Original Assignee
Hangzhou Yunling Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Yunling Technology Co ltd filed Critical Hangzhou Yunling Technology Co ltd
Priority to CN202110290417.6A priority Critical patent/CN112860710A/en
Publication of CN112860710A publication Critical patent/CN112860710A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention discloses a data processing method and a system, and also discloses a data query method and a system, wherein the data processing method comprises the following steps: configuring a query dimension; subscribing and acquiring business data corresponding to the query dimension from a first message queue to acquire data to be processed; and processing the data to be processed to obtain corresponding integrated data, storing the integrated data in a target database, and using a timestamp corresponding to the integrated data as a first message offset, wherein the first message offset is used for identifying the position of a message corresponding to the integrated data in a first message queue. According to the invention, data loss in the data processing process is avoided by designing the message offset, data can be processed in batch based on the query dimension, and the obtained integrated data can provide data support for subsequent data query.

Description

Data processing method, device and system and data query method and system
Technical Field
The present invention relates to the field of data processing, and in particular, to a data processing technique and a data query technique.
Background
Nowadays, a service system (denoted as a source service system) often undertakes dual functions of data writing and data query, and the service system often stores data in a corresponding database (denoted as a source database) according to service categories.
The query request initiated by the user often relates to various services, so the source service system needs to extract data from the corresponding sub-databases in the source database respectively according to the query request, and generates a query result for display after performing processing steps such as statistics, integration, calculation and the like on the extracted data; therefore, when there are many service types and the data size corresponding to each service is large, the query speed will become slow.
In the existing scheme for processing data based on the message queue, the subscriber often records that the corresponding message is consumed when acquiring the subscribed data, so as to determine the initial position of pulling the message from the message queue next time, but in the actual processing process, the subscribed data is often not processed and completed because of the abnormality of the subscriber, and when the subscriber resumes working again, the subscriber pulls a new message from the message queue according to the determined initial position, so that the processed data is lost when the subscriber is abnormal.
Disclosure of Invention
Aiming at the problem of data loss in the existing data processing based on the message queue, the invention provides a data processing technology for avoiding data loss by designing message offset, which can process data in batches based on query dimension, and the obtained integrated data can provide data support for subsequent data query.
In order to solve the technical problem, the invention is solved by the following technical scheme:
the invention provides a data processing method, which comprises the following steps:
configuring a query dimension;
subscribing and acquiring business data corresponding to the query dimension from a first message queue to acquire data to be processed;
the messages in the first message queue may contain traffic data and may also contain raw data for generating the traffic data.
And processing the data to be processed to obtain corresponding integrated data, storing the integrated data in a target database, and using a timestamp corresponding to the integrated data as a first message offset, wherein the first message offset is used for identifying the position of a message corresponding to the integrated data in a first message queue.
As an implementable embodiment:
and updating the first message offset by using the timestamp corresponding to the data to be processed after the integrated data corresponding to the data to be processed is stored in the target database.
As an implementable embodiment, the method for acquiring the service data includes:
the distributing module subscribes from the second message queue to obtain original data, namely, the messages in the second message queue contain the original data;
generating service data containing service identification based on the original data by a distribution module;
and the distribution module acquires a service theme corresponding to the service identifier, distributes the service data to the first message queue based on the service theme, and uses a timestamp corresponding to the service data as a second message offset, wherein the second message offset is used for identifying the position of a message corresponding to the service data in a second message queue.
As an implementable embodiment:
and after the distribution module distributes the service data corresponding to the original data to the first message queue, updating a second message offset by using a timestamp corresponding to the original data.
The invention also provides a data processing system, which comprises a target database and a plurality of processing modules, wherein the processing units correspond to the query dimensions one by one, and each processing module is connected with the target database through signals;
the processing module is used for subscribing and obtaining business data corresponding to the query dimension of the first message queue from the first message queue, obtaining data to be processed, processing the data to be processed, obtaining corresponding integrated data, storing the corresponding integrated data into a target database, and using a timestamp corresponding to the integrated data as a first message offset, wherein the first message offset is used for identifying the position of a message corresponding to the integrated data in the first message queue;
the target database is used for storing the integrated data based on the query dimension.
As an implementation mode, the system further comprises a distribution module, wherein the distribution module is respectively connected with each processing module through signals;
the distribution module is used for acquiring the service data and publishing the service data to a first message queue based on a preset service theme.
As one implementable embodiment, the processing module includes:
the first storage unit is used for storing first configuration data and a first message offset, wherein the first configuration data comprises a query dimension and a business theme mapped with the query dimension;
the first subscription unit is in signal connection with the storage unit and is used for subscribing corresponding service data from a first message queue based on the first message offset and the service theme to obtain data to be processed;
and the processing unit is respectively in signal connection with the first storage unit, the first subscription unit and the target database, is used for processing the data to be processed, acquiring corresponding integrated data, storing the integrated data into the target database, and is also used for updating the first message offset by using a timestamp corresponding to the integrated data.
As an implementable manner, the distribution module comprises a second storage unit, a second subscription unit, a cleaning unit and a publishing unit, wherein the second subscription unit, the cleaning unit and the publishing unit are sequentially in signal connection, and the second storage unit is respectively in signal connection with the second subscription unit and the publishing unit;
the second storage unit is used for storing second configuration data and a second message offset, the second configuration data comprises a service identifier and a service theme mapped with the service identifier, and the second message offset is used for identifying the position of processed original data in a second message queue;
the second subscription unit is used for subscribing and obtaining original data from a second message queue based on the second message offset;
the cleaning unit is used for cleaning the original data to obtain service data containing service identification;
and the issuing unit is used for issuing the service data to the first message queue according to a service theme based on the service identifier and updating a second message offset by using a timestamp corresponding to the service data.
The invention also provides a data query method, which comprises the following steps:
and acquiring a query request, querying a target database in any one of the data processing systems based on the query request, and acquiring and feeding back a query result.
The invention also provides a data query system, which comprises a query subsystem and a storage subsystem, wherein the storage subsystem is any one of the data processing systems;
the query subsystem is in signal connection with the storage subsystem and is used for acquiring a query request, querying in the storage subsystem based on the query request, and generating and feeding back a corresponding query result. Due to the adoption of the technical scheme, the invention has the remarkable technical effects that:
according to the invention, the business data is processed and stored according to the query dimension, so that in the subsequent process of querying the data, the query result can be directly obtained by performing single-table query in the target database according to the query dimension, and the retrieval speed is effectively improved.
According to the method and the device, after the integrated data are written into the target database, the time stamp of the message corresponding to the integrated data is used as the first message offset, and when the message is pulled from the first message queue, the message with the first message offset is pulled, so that the problem of data loss is effectively avoided.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a schematic flow chart diagram of a data processing method of the present invention;
FIG. 2 is a block diagram showing the connection of modules of the data processing system according to embodiment 3;
FIG. 3 is a block diagram showing the connection of modules of the data processing system according to embodiment 4;
FIG. 4 is a block diagram of a module connection of distribution module 400 of FIG. 3;
fig. 5 is a block diagram of the process block 100 of fig. 3.
Detailed Description
The present invention will be described in further detail with reference to examples, which are illustrative of the present invention and are not to be construed as being limited thereto.
Embodiment 1, a data processing method, is used to, after a source service system writes generated data into a source database 300 according to service categories, collect and integrate data in the source database 300 based on query dimensions, and store the obtained integrated data in a target database 200, so as to provide data support for a subsequent query process.
As shown in fig. 1, the above steps of collecting and integrating data in the source data based on the query dimensions and storing the obtained integrated data in the target database 200 are performed by the processing module 100:
s100, configuring query dimensions, namely, the query dimensions corresponding to the integrated data;
the technical personnel in the field can inquire the dimension according to the actual configuration, for example, in the field of gas stations, the inquiring dimension comprises an order dimension, a coupon dimension and a gas station dimension, each inquiring dimension can be mapped with one or more business topics, and the technical personnel establishes the mapping relation between the inquiring dimension and the business topics according to the actual situation.
S200, subscribing and acquiring business data corresponding to the query dimension from a first message queue to acquire data to be processed;
the messages in the first message queue are the updated data (original data) of the source database 300, that is, the BinLog log of the source database 300 is monitored through the first message queue, so as to obtain the updated data of the source database 300;
and screening and cleaning the obtained original data according to the service theme to obtain service data corresponding to the query dimension, and taking the obtained service data as data to be processed.
S300, processing the data to be processed to obtain corresponding integrated data, storing the integrated data into a target database 200, and taking a timestamp corresponding to the integrated data as a first message offset;
counting, integrating and/or operating the data to be processed based on a preset data processing rule to obtain corresponding integrated data
In this embodiment, the timestamp corresponding to the integrated data is a timestamp of a message corresponding to the integrated data, and the first message offset is used to identify a position of the message corresponding to the integrated data in the first message queue.
In the prior art, after a message is pulled from a message queue, the message is marked to be consumed, when the message is pulled next time, the message which is not consumed is pulled according to the time sequence, when the actual processing process is abnormal, data processing is performed based on the pulled message, the data processed in the abnormal state is lost after the process of the data processing is restarted, and the message corresponding to the data in the message queue is displayed to be consumed, namely, the message is not pulled again to be repeatedly processed.
In this embodiment, after the integrated data is written into the target database 200, the timestamp of the message corresponding to the integrated data is used as the first message offset, and when the message is pulled from the first message queue, the message after the first message offset is pulled, so that the problem of data loss is effectively avoided.
Further, after the integration data corresponding to the to-be-processed data is stored in the target database 200, the first message offset is updated by using the timestamp corresponding to the to-be-processed data.
That is, only the timestamp corresponding to the last processed data to be processed is recorded, and all timestamps do not need to be saved.
Taking the order dimension as an example, in this embodiment, multiple types of service data corresponding to the order dimension are obtained based on the service theme corresponding to the order dimension and are collected and integrated, the obtained integrated data are sent to the target database 200 and stored in the wide table corresponding to the order dimension, that is, the multi-table data of the source database 300 is converted into the single-table data, and when data query is performed based on the order dimension subsequently, multiple data do not need to be respectively called from the source database 300 and processed, and only the corresponding query result needs to be called from the target database 200, so that the query speed is effectively increased.
Further, the service data includes a service identifier, and in step S100, a mapping relationship between the query dimension and the service identifier is also configured;
the technical personnel in the field can map the query dimension with one or more service identifiers according to actual needs, so that after the original data subscribed from the first message queue is cleaned into the service data containing the service identifiers, the obtained service data is screened based on the mapping relation between the query dimension and the service identifiers, and the corresponding data to be processed is obtained.
Embodiment 2, the method for acquiring service data changes the message in the first message queue in embodiment 1 from original data to service data, and includes the following steps:
s210, the distribution module 400 subscribes from the second message queue to obtain the original data, that is, the distribution module 400 monitors the BinLog log of the source data through the second message queue.
S220, generating service data containing service identification by the distribution module 400 based on the original data;
namely, preprocessing the original data to obtain service data containing a service identifier;
a person skilled in the art can set a preprocessing mode by himself according to actual needs, and in this embodiment, a table name of a table where the original data is located is obtained based on the original data, the table name is used as a service identifier, and format conversion and cleaning are performed on the obtained original data, so that corresponding service data is obtained.
S230, the distribution module 400 obtains a service topic corresponding to the service identifier, issues the service data to the first message queue based on the service topic, and uses a timestamp corresponding to the service data as a second message offset;
in this embodiment, the timestamp corresponding to the service data is a timestamp of a message corresponding to the service data, and the second message offset is used to identify a position of the message corresponding to the service data in the second message queue.
The second message offset is used for identifying the position of the message corresponding to the service data in the second message queue.
When each processing module 100 subscribes to the original data by itself to process the original data to obtain the required service data, each processing module 100 needs to clean the subscribed original data, and the data required by the processing module 100 may not exist in the obtained original data, such as the changed data of the order dimension that does not require the membership grade of the user, so that the processing module 100 has unnecessary overhead.
And one piece of business data can be used as data to be processed under a plurality of query dimensions, such as business data corresponding to points of a user, and the business data is integrated under the order dimension to show the point change of the user under a certain order, and is integrated under the member dimension to show the point change condition and the current remaining points of the user.
In the actual use process:
a mapping relation between a service theme and a service identifier is configured for the distribution module 400 in advance, one service theme can be mapped with one or more service identifiers, the distribution module 400 preprocesses original data, and the obtained service data is issued to a second message queue according to the service theme;
the processing module 100 is configured with a mapping relationship between the service topics and the query dimensions in advance, one query dimension can be mapped with one or more service topics, and each processing module 100 subscribes corresponding service data in the second message queue according to the service topics, and each processing module 100 does not need to repeatedly perform data screening work, and when there is no service data required by each processing module 100, the processing module 100 does not work.
Further, after the distribution module 400 issues the service data corresponding to the original data to the first message queue, the second message offset is updated by using the timestamp corresponding to the original data.
That is, only the timestamp corresponding to the last processed original data is recorded, and all timestamps do not need to be saved.
Embodiment 3, a data processing system, as shown in fig. 2, includes a target database 200 and a plurality of processing modules 100, where the processing units 130 correspond to query dimensions one to one, and each processing module 100 is connected to the target database 200 in a signal manner;
the processing module 100 is configured to subscribe to obtain service data corresponding to a query dimension of a first message queue, obtain data to be processed, process the data to be processed, obtain corresponding integration data, store the integration data in the target database 200, and use a timestamp corresponding to the integration data as a first message offset, where the first message offset is used to identify a position of a message corresponding to the integration data in the first message queue;
that is, the processing module 100 is configured to execute the data processing steps performed by the processing module 100 in embodiment 1.
The target database 200 is configured to store the integration data based on a query dimension.
That is, the target database 200 has a wide table corresponding to the query dimension, and the processing module 100 writes the processed integration data into the wide table of the target database 200 under the corresponding query dimension, and if the integration data and the data of the member dimension are the integration data, the processing module 100 writes the integration data into the wide table of the target database 200 corresponding to the member dimension.
Further, the processing module 100 includes:
a first storage unit 110, configured to store first configuration data and a first message offset, where the first configuration data includes a query dimension and a service identifier mapped to the query dimension;
a first subscribing unit 120, which is connected to the storage unit by signals, and configured to subscribe corresponding original data from a first message queue based on the first message offset and the query dimension, generate service data with a service identifier based on the original data, and extract data to be processed from the obtained service data based on first mapping data;
the processing unit 130 is respectively connected to the first storage unit 110, the first subscription unit 120, and the target database 200 in a signal manner, and is configured to process the data to be processed, obtain corresponding integrated data, store the corresponding integrated data in the target database 200, and update the first message offset by using a timestamp corresponding to the integrated data.
This embodiment is an embodiment of an apparatus corresponding to embodiment 1, and since it is basically similar to embodiment 1, the description is relatively simple, and for the relevant points, refer to the partial description of embodiment 1.
Embodiment 4, a distribution module 400 is added in embodiment 3, as shown in fig. 3, the distribution module 400 issues service data to each processing module 100 through a first message queue;
that is, the distributing module 400 is configured to obtain service data and issue the service data to a first message queue.
As shown in fig. 4, the distribution module 400 includes a second storage unit 410, a second subscription unit 420, a cleaning unit 430, and a publishing unit 440, where the second subscription unit 420, the cleaning unit 430, and the publishing unit 440 are sequentially connected by signals, and the second storage unit 410 is respectively connected by signals to the second subscription unit 420 and the publishing unit 440;
the second storage unit 410 is configured to store second configuration data and a second message offset, where the second configuration data includes a service identifier and a service topic mapped to the service identifier, and the second message offset is used to identify a position of processed original data in a second message queue;
the second subscribing unit 420 is configured to subscribe to obtain original data from a second message queue based on the second message offset;
the second subscribing unit 420 subscribes to the BinLog log of the source database 300 through MQ (message queue), and when data in the source database 300 is updated, the second subscribing unit 420 subscribes to obtain the updated data as original data.
In this embodiment, the original data is obtained by subscribing to the BinLog log of the source database 300, which does not affect the source service system and the source database 300, has a low coupling degree with the source database 300, and can be interfaced with the source databases 300 of different data storage methods, such as MySql, Oracle, elastic search, ClickHouse, and the like, and has strong expandability.
The cleaning unit 430 is configured to clean the original data to obtain service data including a service identifier;
the original data includes a table name, so the table name is converted into a service identifier in this embodiment, for example, the service identifier corresponding to the score table is a score.
In the actual working process, the cleansing unit 430 performs format conversion (to match the transmission format, such as converting the original data into the format encoded by json UTF8 in this embodiment) and cleansing on the original data based on a preset cleansing rule, obtains service data containing the service identifier,
the publishing unit 440 is configured to publish the service data to the first message queue according to a service topic based on the service identifier, and update the second message offset by using a timestamp corresponding to the service data.
As shown in fig. 5, the processing module 100 includes:
the first storage unit 110 is configured to store first configuration data and a first message offset, where the first configuration data includes a query dimension and a business topic mapped to the query dimension;
a first subscribing unit 120, which is in signal connection with the storage unit, and configured to subscribe, based on the first message offset and the service topic, corresponding service data from a first message queue to obtain to-be-processed data;
the processing unit 130 is respectively connected to the first storage unit 110, the first subscription unit 120, and the target database 200 in a signal manner, and is configured to process the data to be processed, obtain corresponding integrated data, store the corresponding integrated data in the target database 200, and update the first message offset by using a timestamp corresponding to the integrated data.
In this embodiment, each processing module 100 only needs to take the service data obtained by subscription thereof as data to be processed, and perform statistics, data integration and calculation according to a preset data processing rule to obtain corresponding integrated data, that is, the processing module 100 performs message subscription according to the service topic corresponding to the query dimension thereof.
When the number of the service identifiers is small, the query dimension and the service identifier can be directly mapped, but in the actual use process, the number of the service identifiers is large, and the service identifiers change along with the change of the service or the user requirement, and when the query dimension and the service identifier are directly mapped, the subsequent management and maintenance of the processing module 100 are inconvenient; if the gas station increases user viscosity and a member system, the service identification adds new grades and points, and the inquiry dimension adds new member dimensions; at this time, the service identifier needs to be configured for the processing module 100 related to the grade and the integral, and the identifier needs to be configured for the processing module 100 corresponding to the newly added member dimension, so that the workload is large, and the configuration efficiency is low.
In this embodiment, the service theme is mapped with the service identifier, the distribution unit publishes the service data according to the service theme, the service theme is mapped with the query dimension, and each processing module 100 subscribes according to the service theme, so that when a service identifier is newly added, only the service theme corresponding to the newly added service identifier configured in the distribution module 400 is needed, and configuration of each processing module 100 does not need to be changed one by one.
When the data processing system disclosed in this embodiment is applied to a gas station, the working content of the data processing system includes:
1. data configuration:
1.1, configure distribution module 400:
configuring service identifiers for the distribution module 400 based on the types of the data tables stored in the source database 300, for example, storing an order table (a refueling order), a user table, a fuel station table, a shift table, a recharge table, a score table, a grade table (member grade) and a card table (user card) in the source database 300, and configuring service identifiers corresponding to the table names of the data tables one by one;
configuring a data cleansing rule, a second message offset and second configuration data for the distribution module 400;
1.2, configuration processing module 100:
configuring query dimensions for each processing module 100 based on data query requirements, wherein the query dimensions in the present case include order dimensions, member dimensions, coupon dimensions, and oil station dimensions;
the data processing rules, the first message offset and the first configuration data are configured for each processing module 100.
1.3, configuration target database 200:
constructing a width table which corresponds to the query dimension one by one in the target database 200, wherein the width table comprises an order width table, a member width table, a coupon width table and an oil station width table;
each refueling order and associated data (such as oil station information, oil product information, payment information and user information) are integrated in the order wide table, so that the order details can be quickly inquired.
The member wide table integrates the associated data (the refueling order, the recharging order, the point information and the grade information) of each member based on the member data, and is used for maintaining the member data, facilitating the quick inquiry of the real-time change of the members and the associated data such as the consumption records, the member grades and the member points of each member.
The coupon width table is used for providing detailed information of the coupons, such as the condition of each coupon and the source route.
The oil station width table is used for providing detailed relevant information of the oil station, such as the geographical position of the oil station and corresponding business information.
2. The distribution module 400 performs data distribution:
pulling, by the second subscription unit 420, original data from the second message queue based on the second message offset;
the cleaning unit 430 is used for cleaning the original data based on the data cleaning rule to obtain service data containing a service identifier;
the publishing unit 440 publishes the service data to the first message queue according to the service topic based on the second configuration data, and updates the second message offset by using the timestamp corresponding to the service data to indicate the starting position when the original data is pulled from the second message queue next time.
3. The processing module 100 performs data processing:
pulling, by the first subscribing unit 120, the service data of the corresponding service topic from the first message queue based on the first configuration data and the first message offset, to obtain data to be processed;
the processing unit 130 is configured to process the to-be-processed data based on the corresponding data processing rule, obtain corresponding integration data, store the integration data in the corresponding wide table of the target database 200, and update the first message offset with a timestamp corresponding to the integration data to indicate a starting position when the service data is next pulled from the second message queue.
If the query dimension corresponding to the processing module 100 is the order dimension, the order ID is used as the unique primary key, the search is performed on the order width table of the target database 200 based on the order ID, and when the search is completed, the corresponding field is updated based on the generated integration data, otherwise, a data row is newly added in the order ID and the order width table, and the corresponding field is updated based on the generated integration data.
Embodiment 5, a data query method, comprising the steps of: a query request is obtained, and a query is performed on the target database 200 in the data processing system described in embodiment 3 or embodiment 4 based on the query request, so as to obtain and feed back a query result.
The query request refers to a complex query request that requires extracting data from at least two data tables of the source database 300 and/or performing operations on the extracted data.
Since the data stored in the target database 200 is a wide table corresponding to the query dimension, the corresponding data can be directly extracted from the corresponding wide table based on the query request as the query result, and the query efficiency is high.
Embodiment 6, a data query system for executing the data query method of embodiment 5, comprising a query subsystem and a storage subsystem;
the storage subsystem is the data processing system of embodiment 3 or 4;
the query subsystem is in signal connection with the storage subsystem and is used for acquiring a query request, querying in the storage subsystem based on the query request, and generating and feeding back a corresponding query result.
For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.
The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention has been described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing terminal to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing terminal to cause a series of operational steps to be performed on the computer or other programmable terminal to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It should be noted that:
while preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
In addition, it should be noted that, in the embodiments described in the present specification, names and the like of the modules may be different. All equivalent or simple changes of the structure, the characteristics and the principle of the invention which are described in the patent conception of the invention are included in the protection scope of the patent of the invention. Various modifications, additions and substitutions for the specific embodiments described may be made by those skilled in the art without departing from the scope of the invention as defined in the accompanying claims.

Claims (10)

1. A data processing method characterized by comprising the steps of:
configuring a query dimension;
subscribing and acquiring business data corresponding to the query dimension from a first message queue to acquire data to be processed;
and processing the data to be processed to obtain corresponding integrated data, storing the integrated data in a target database, and using a timestamp corresponding to the integrated data as a first message offset, wherein the first message offset is used for identifying the position of a message corresponding to the integrated data in a first message queue.
2. The data processing method of claim 1, wherein:
and updating the first message offset by using the timestamp corresponding to the data to be processed after the integrated data corresponding to the data to be processed is stored in the target database.
3. The data processing method according to claim 1 or 2, wherein the method for acquiring the service data comprises:
the distribution module subscribes from the second message queue to obtain original data;
generating service data containing service identification based on the original data by a distribution module;
and the distribution module acquires a service theme corresponding to the service identifier, distributes the service data to the first message queue based on the service theme, and uses a timestamp corresponding to the service data as a second message offset, wherein the second message offset is used for identifying the position of a message corresponding to the service data in a second message queue.
4. A data processing method according to claim 3, characterized in that:
and after the distribution module distributes the service data corresponding to the original data to the first message queue, updating a second message offset by using a timestamp corresponding to the original data.
5. A data processing system is characterized by comprising a target database and a plurality of processing modules, wherein the processing units correspond to query dimensions one by one, and each processing module is connected with the target database in a signal mode;
the processing module is used for subscribing and obtaining business data corresponding to the query dimension of the first message queue from the first message queue, obtaining data to be processed, processing the data to be processed, obtaining corresponding integrated data, storing the corresponding integrated data into a target database, and using a timestamp corresponding to the integrated data as a first message offset, wherein the first message offset is used for identifying the position of a message corresponding to the integrated data in the first message queue;
the target database is used for storing the integrated data based on the query dimension.
6. The data processing system of claim 5, further comprising a distribution module, the distribution module being in signal communication with each processing module, respectively;
the distribution module is used for acquiring the service data and publishing the service data to a first message queue based on a preset service theme.
7. The data processing system of claim 6, wherein the processing module comprises:
the first storage unit is used for storing first configuration data and a first message offset, wherein the first configuration data comprises a query dimension and a business theme mapped with the query dimension;
the first subscription unit is in signal connection with the storage unit and is used for subscribing corresponding service data from a first message queue based on the first message offset and the service theme to obtain data to be processed;
and the processing unit is respectively in signal connection with the first storage unit, the first subscription unit and the target database, is used for processing the data to be processed, acquiring corresponding integrated data, storing the integrated data into the target database, and is also used for updating the first message offset by using a timestamp corresponding to the integrated data.
8. The data processing system of claim 6 or 7, wherein the distribution module comprises a second storage unit, a second subscription unit, a cleaning unit and a publishing unit, wherein the second subscription unit, the cleaning unit and the publishing unit are sequentially connected by signals, and the second storage unit is respectively connected by signals with the second subscription unit and the publishing unit;
the second storage unit is used for storing second configuration data and a second message offset, the second configuration data comprises a service identifier and a service theme mapped with the service identifier, and the second message offset is used for identifying the position of processed original data in a second message queue;
the second subscription unit is used for subscribing and obtaining original data from a second message queue based on the second message offset;
the cleaning unit is used for cleaning the original data to obtain service data containing service identification;
and the issuing unit is used for issuing the service data to the first message queue according to a service theme based on the service identifier and updating a second message offset by using a timestamp corresponding to the service data.
9. A data query method, comprising the steps of:
acquiring a query request, querying a target database in the data processing system according to any one of claims 6 to 8 based on the query request, and obtaining and feeding back a query result.
10. A data interrogation system comprising an interrogation subsystem and a storage subsystem, said storage subsystem being a data processing system as claimed in any one of claims 6 to 8;
the query subsystem is in signal connection with the storage subsystem and is used for acquiring a query request, querying in the storage subsystem based on the query request, and generating and feeding back a corresponding query result.
CN202110290417.6A 2021-03-18 2021-03-18 Data processing method, device and system and data query method and system Pending CN112860710A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110290417.6A CN112860710A (en) 2021-03-18 2021-03-18 Data processing method, device and system and data query method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110290417.6A CN112860710A (en) 2021-03-18 2021-03-18 Data processing method, device and system and data query method and system

Publications (1)

Publication Number Publication Date
CN112860710A true CN112860710A (en) 2021-05-28

Family

ID=75993328

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110290417.6A Pending CN112860710A (en) 2021-03-18 2021-03-18 Data processing method, device and system and data query method and system

Country Status (1)

Country Link
CN (1) CN112860710A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113468187A (en) * 2021-09-02 2021-10-01 太平金融科技服务(上海)有限公司深圳分公司 Multi-party data integration method and device, computer equipment and storage medium

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180034760A1 (en) * 2016-07-27 2018-02-01 Sap Se Time series messaging persistence and publication
CN107995242A (en) * 2016-10-27 2018-05-04 北京京东尚科信息技术有限公司 A kind of method for processing business and system
CN108595483A (en) * 2018-03-13 2018-09-28 腾讯科技(深圳)有限公司 Data processing method and relevant apparatus
CN109002484A (en) * 2018-06-25 2018-12-14 北京明朝万达科技股份有限公司 A kind of method and system for sequence consumption data
CN109189835A (en) * 2018-08-21 2019-01-11 北京京东尚科信息技术有限公司 The method and apparatus of the wide table of data are generated in real time
CN110740195A (en) * 2019-11-20 2020-01-31 山东鲁能软件技术有限公司 distributed system data synchronization method and system based on message engine
CN111008189A (en) * 2019-11-26 2020-04-14 浙江电子口岸有限公司 Dynamic data model construction method
CN111506660A (en) * 2020-04-21 2020-08-07 瑞纳智能设备股份有限公司 Heat supply network real-time data warehouse system
CN112181678A (en) * 2020-09-10 2021-01-05 珠海格力电器股份有限公司 Service data processing method, device and system, storage medium and electronic device

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180034760A1 (en) * 2016-07-27 2018-02-01 Sap Se Time series messaging persistence and publication
CN107995242A (en) * 2016-10-27 2018-05-04 北京京东尚科信息技术有限公司 A kind of method for processing business and system
CN108595483A (en) * 2018-03-13 2018-09-28 腾讯科技(深圳)有限公司 Data processing method and relevant apparatus
CN109002484A (en) * 2018-06-25 2018-12-14 北京明朝万达科技股份有限公司 A kind of method and system for sequence consumption data
CN109189835A (en) * 2018-08-21 2019-01-11 北京京东尚科信息技术有限公司 The method and apparatus of the wide table of data are generated in real time
CN110740195A (en) * 2019-11-20 2020-01-31 山东鲁能软件技术有限公司 distributed system data synchronization method and system based on message engine
CN111008189A (en) * 2019-11-26 2020-04-14 浙江电子口岸有限公司 Dynamic data model construction method
CN111506660A (en) * 2020-04-21 2020-08-07 瑞纳智能设备股份有限公司 Heat supply network real-time data warehouse system
CN112181678A (en) * 2020-09-10 2021-01-05 珠海格力电器股份有限公司 Service data processing method, device and system, storage medium and electronic device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
过往记忆: "Kafka原理和实践", 《HTTPS://BLOG.CSDN.NET/WYPBLOG/ARTICLE/DETAILS/107625320?》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113468187A (en) * 2021-09-02 2021-10-01 太平金融科技服务(上海)有限公司深圳分公司 Multi-party data integration method and device, computer equipment and storage medium
CN113468187B (en) * 2021-09-02 2021-11-23 太平金融科技服务(上海)有限公司深圳分公司 Multi-party data integration method and device, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
CN109284334B (en) Real-time database synchronization method and device, electronic equipment and storage medium
CN111597257A (en) Database synchronization method and device, storage medium and terminal
CN111666326B (en) ETL scheduling method and device
CN101355605B (en) Method for processing network management alarm and alarm processor
CN102456031A (en) MapReduce system and method for processing data streams
CN112445863A (en) Real-time data synchronization method and system
WO2019076001A1 (en) Information updating method and device
CN106802947A (en) The data handling system and method for entity relationship diagram
CN114579668A (en) Database data synchronization method
CN111460240B (en) Cross-region multi-activity micro-service architecture page turning data query method and device
CN112860710A (en) Data processing method, device and system and data query method and system
CN114238388A (en) Heterogeneous data collection and retrieval system based on multiple protocols
CN112860680B (en) Data processing method and system, and data query method and system
CN114020819A (en) Multi-system parameter synchronization method and device
CN107153679B (en) Extraction statistical method and system for semi-structured big data
CN113505173A (en) Data acquisition synchronization system and synchronization method
CN112163948A (en) Method, system, equipment and storage medium for separately-moistening calculation
CN112860711A (en) Data storage method and system and data query method and system
CN112559514A (en) Information processing method and system
CN110941536B (en) Monitoring method and system, and first server cluster
CN111061719B (en) Data collection method, device, equipment and storage medium
CN105718550B (en) Media information publishing method and system
CN108536758B (en) Data table reconstruction method, device and system for database mode
CN100504867C (en) Distributed search engine system and ID mapping table expanding method
CN109783580A (en) Method of data synchronization and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210528