CN112434062A - Quasi-real-time data processing method, device, server and storage medium - Google Patents

Quasi-real-time data processing method, device, server and storage medium Download PDF

Info

Publication number
CN112434062A
CN112434062A CN202011343327.0A CN202011343327A CN112434062A CN 112434062 A CN112434062 A CN 112434062A CN 202011343327 A CN202011343327 A CN 202011343327A CN 112434062 A CN112434062 A CN 112434062A
Authority
CN
China
Prior art keywords
data
time
real
hive
database
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011343327.0A
Other languages
Chinese (zh)
Inventor
赵乐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Puhui Enterprise Management Co Ltd
Original Assignee
Ping An Puhui Enterprise Management Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Puhui Enterprise Management Co Ltd filed Critical Ping An Puhui Enterprise Management Co Ltd
Priority to CN202011343327.0A priority Critical patent/CN112434062A/en
Publication of CN112434062A publication Critical patent/CN112434062A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2308Concurrency control
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP

Abstract

The invention relates to data processing, and provides a quasi-real-time data processing method, a quasi-real-time data processing device, a server and a storage medium. The method can determine and synchronize message middleware of the database operation logs, consume and analyze the database operation logs to obtain quasi-real-time data and updating time, store the quasi-real-time data into a KUDU database, obtain a pre-established HIVE database and establish a HIVE external table, the HIVE internal table in the HIVE database stores data before the updating time, determine a storage path of the quasi-real-time data in the KUDU database, store the storage path to the HIVE external table, and when the query termination time is the updating time, combine the HIVE external table and the HIVE internal table to obtain and query a target data table to obtain response data. The invention improves the query performance of the quasi-real-time data. In addition, the invention also relates to a block chain technology, and the response data can be stored in the block chain.

Description

Quasi-real-time data processing method, device, server and storage medium
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a method and an apparatus for processing near real-time data, a server, and a storage medium.
Background
In the current big data platform, the HIVE data warehouse is usually used as an offline data warehouse of the big data platform, and in order to ensure the performance of the big data platform, the current big data platform does not support quasi real-time updating and deleting of data, so that the latest data cannot be synchronized to the HIVE data warehouse in the big data platform in quasi real time, and quasi real-time data cannot be queried from the HIVE data warehouse.
Disclosure of Invention
In view of the foregoing, there is a need to provide a quasi-real-time data processing method, apparatus, server, and storage medium, which can query quasi-real-time data while ensuring the performance of a large data platform corresponding to a HIVE database, and improve the query performance.
On one hand, the invention provides a quasi-real-time data processing method, which is applied to a server and comprises the following steps:
when the generation of a database operation log is detected, determining a message middleware connected with the server according to the database operation log, and synchronizing the database operation log to the message middleware;
consuming and analyzing the database operation log in the message middleware to obtain quasi-real-time data and the updating time of the quasi-real-time data, and storing the quasi-real-time data into a KUDU database;
acquiring a pre-created HIVE database, and creating an HIVE external table based on the HIVE database, wherein an HIVE internal table in the HIVE database stores data before the updating time;
determining a storage path of the quasi real-time data in the KUDU database, and storing the storage path into the HIVE external table;
when a data query request is received, determining query termination time according to the data query request;
when the query termination time is detected to be the updating time, combining the HIVE external table and the HIVE internal table to obtain a target data table;
and querying the target data table by using the data query request to obtain response data of the data query request.
According to a preferred embodiment of the present invention, the determining the message middleware connected to the server according to the database operation log comprises:
determining the log type of the database operation log;
acquiring a log request corresponding to the log type, and acquiring a middleware connected with the server;
analyzing the log request by using the middleware to obtain an analysis result;
determining the middleware which indicates the successful analysis of the log request by the analysis result as target middleware, and determining the analysis efficiency of the target middleware;
and determining the target middleware with the highest parsing efficiency as the message middleware.
According to a preferred embodiment of the present invention, said consuming and analyzing said database operation log in said message middleware to obtain quasi real-time data and update time of said quasi real-time data, and storing said quasi real-time data in said KUDU database comprises:
acquiring a configuration label library from the message middleware, and acquiring a first preset label and a second preset label from the configuration label library, wherein the first preset label is used for indicating data, and the second preset label is used for indicating time;
extracting information corresponding to the first preset label from the database operation log as target data, and converting the data format of the target data into a preset format to obtain the quasi-real-time data;
extracting information corresponding to the second preset label from the database operation log as the updating time;
and filling the quasi-real-time data and the updating time into a configuration mapping table to obtain a real-time data table, and sending the real-time data table to the KUDU database.
According to a preferred embodiment of the present invention, the determining a storage path of the near real-time data in the KUDU database, and storing the storage path into the HIVE external table includes:
traversing the KUDU database by using the first preset tag;
when the first preset label is traversed in the KUDU database, determining a path traversed to the first preset label as the storage path;
and generating a path link according to the storage path, and writing the path link into the HIVE external table.
According to the preferred embodiment of the present invention, the combining the external HIVE table and the internal HIVE table to obtain the target data table includes:
determining a first data volume in the storage path indicated by the HIVE external table, and determining a second data volume of the HIVE internal table;
when the first data volume is larger than the second data volume, acquiring historical data of the HIVE internal table, and writing the historical data into the real-time data table to obtain the target data table; or
And when the first data volume is smaller than or equal to the second data volume, writing the quasi real-time data into the HIVE internal table to obtain the target data table.
According to a preferred embodiment of the present invention, the determining a query termination time according to the data query request includes:
analyzing the message information of the data query request to obtain the data information carried by the message information;
extracting query time from the data information;
and determining the time with the largest value in the query time as the query termination time.
According to a preferred embodiment of the present invention, the querying the target data table by using the data query request to obtain response data of the data query request includes:
determining the time with the minimum value in the query time as the query starting time;
acquiring a first time which is greater than the query starting time from the target data table, and acquiring a second time which is less than the query starting time from the target data table;
determining the intersection of the first time and the second time to obtain target time;
and acquiring data corresponding to the target time from the target data table as the response data.
In another aspect, the present invention further provides a quasi real-time data processing apparatus, operating in a server, where the quasi real-time data processing apparatus includes:
the determining unit is used for determining the message middleware connected with the server according to the database operation log when the generation of the database operation log is detected, and synchronizing the database operation log to the message middleware;
the consumption unit is used for consuming and analyzing the database operation log in the message middleware to obtain quasi-real-time data and the updating time of the quasi-real-time data, and storing the quasi-real-time data into a KUDU database;
the device comprises a creating unit, a sending unit and a updating unit, wherein the creating unit is used for acquiring a pre-created HIVE database and creating an HIVE external table based on the HIVE database, and the HIVE internal table in the HIVE database stores data before the updating time;
the storage unit is used for determining a storage path of the quasi real-time data in the KUDU database and storing the storage path into the HIVE external table;
the determining unit is further configured to determine query termination time according to the data query request when the data query request is received;
the combining unit is used for combining the HIVE external table and the HIVE internal table to obtain a target data table when the query termination time is detected to be the updating time;
and the query unit is used for querying the target data table by using the data query request to obtain response data of the data query request.
In another aspect, the present invention further provides a server, where the server includes:
a memory storing computer readable instructions; and
a processor executing computer readable instructions stored in the memory to implement the quasi real-time data processing method.
In another aspect, the present invention further provides a computer-readable storage medium, in which computer-readable instructions are stored, and the computer-readable instructions are executed by a processor in a server to implement the quasi real-time data processing method.
It can be seen from the above technical solutions that, in the present invention, the message middleware connected to the server is determined, so that the message middleware can successfully analyze the database operation log, the message middleware can analyze the database operation log in real time, and further, the near real-time data can be obtained in real time, so that the near real-time data is stored in a designated database, thereby improving storage efficiency, the near real-time data is directly stored in the KUDU database, thereby narrowing the determination range of the storage path, so that the storage path can be quickly determined, since the HIVE external table stores the storage path of the near real-time data, the target data table contains the near real-time data, and the near real-time data updated in real time can be queried by performing data query on the target data table, the query performance is improved. According to the invention, the KUDU database is used for storing the real-time updated quasi-real-time data, and the HIVE internal table in the HIVE database is used for storing the data before the updating time, so that the performance of a large data platform corresponding to the HIVE database can be ensured without updating the latest quasi-real-time data into the HIVE database in real time, and meanwhile, the quasi-real-time data can still be inquired when the quasi-real-time data is not updated into the HIVE database, and the inquiry performance is improved.
Drawings
FIG. 1 is a flow chart of a preferred embodiment of a method for processing near real-time data according to the present invention.
FIG. 2 is a flow chart of one embodiment of the present invention for determining near real time data and update time and storing near real time data.
Fig. 3 is a flow chart of another embodiment of the quasi-real-time data processing method of the present invention.
FIG. 4 is a functional block diagram of a preferred embodiment of a quasi-real-time data processing apparatus according to the present invention.
FIG. 5 is a schematic structural diagram of a server according to a preferred embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in detail with reference to the accompanying drawings and specific embodiments.
FIG. 1 is a flow chart of a preferred embodiment of the method for processing near real-time data according to the present invention. The order of the steps in the flow chart may be changed and some steps may be omitted according to different needs.
The quasi-real-time data processing method is applied to one or more servers, wherein the servers are equipment capable of automatically performing numerical calculation and/or information processing according to computer readable instructions which are set or stored in advance, and hardware of the servers include but are not limited to microprocessors, Application Specific Integrated Circuits (ASICs), Programmable Gate arrays (FPGAs), Digital Signal Processors (DSPs), embedded equipment and the like.
The server may be any electronic product capable of performing human-computer interaction with a user, for example, a Personal computer, a tablet computer, a smart phone, a Personal Digital Assistant (PDA), a game machine, an Internet Protocol Television (IPTV), a smart wearable device, and the like.
The server may include a network device and/or a user device. The network device includes, but is not limited to, a single network server, a server group consisting of a plurality of network servers, or a Cloud Computing (Cloud Computing) based Cloud consisting of a large number of hosts or network servers.
The network where the server is located includes, but is not limited to: the internet, a wide area Network, a metropolitan area Network, a local area Network, a Virtual Private Network (VPN), and the like.
S10, when the generation of the database operation log is detected, determining the message middleware connected with the server according to the database operation log, and synchronizing the database operation log to the message middleware.
In at least one embodiment of the present invention, the database operation log refers to a log recording database operations, where the database operations include data update, data deletion, data addition, and the like.
Further, the message middleware may be a Kafka server.
In at least one embodiment of the present invention, the determining, by the server, the message middleware connected to the server according to the database operation log includes:
determining the log type of the database operation log;
acquiring a log request corresponding to the log type, and acquiring a middleware connected with the server;
analyzing the log request by using the middleware to obtain an analysis result;
determining the middleware which indicates the successful analysis of the log request by the analysis result as target middleware, and determining the analysis efficiency of the target middleware;
and determining the target middleware with the highest parsing efficiency as the message middleware.
By the embodiment, the middleware connected with the server can be acquired, and further, by using the log request corresponding to the log type, the target middleware capable of successfully analyzing the request of the log type can be determined, and further, the message middleware with the highest analysis effect can be determined from the target middleware, so that the analysis efficiency of the database operation log can be improved.
In at least one embodiment of the present invention, when the message middleware is determined, the server sends the database operation log to the message middleware so as to parse the database operation log.
S11, consuming and analyzing the database operation log in the message middleware to obtain quasi-real-time data and the updating time of the quasi-real-time data, and storing the quasi-real-time data in a KUDU database.
In at least one embodiment of the present invention, the server may obtain the operation indicated in the database operation log by pulling the database operation log from the message middleware for consumption, and further analyze the database operation log according to the indicated operation.
In at least one embodiment of the present invention, the update time refers to an update time of the quasi real-time data in the database corresponding to the database operation log.
The KUDU database refers to a relational database, each table in the KUDU database is composed of a plurality of fields, and each table must specify a primary key composed of at least one field.
Referring to fig. 2, fig. 2 is a flow chart of an embodiment of the present invention for determining near real-time data and update time and storing the near real-time data. In at least one embodiment of the present invention, the server consumes and parses the database operation log in the message middleware to obtain quasi-real-time data and update time of the quasi-real-time data, and storing the quasi-real-time data in a KUDU database includes:
s110, a configuration label library is obtained from the message middleware, and a first preset label and a second preset label are obtained from the configuration label library, wherein the first preset label is used for indicating data, and the second preset label is used for indicating time.
Wherein, a plurality of predefined tags are stored in the configuration tag library.
S111, extracting information corresponding to the first preset label from the database operation log as target data, and converting the data format of the target data into a preset format to obtain the quasi-real-time data.
The preset format refers to a data format required by a user, and the preset format may be freely set by the user, for example, the preset format may be num (K, Y), where K is an integer smaller than Y, and for example, if K is 1, Y may be 8. The preset format may also be a set format, etc.
And S112, extracting information corresponding to the second preset label from the database operation log as the updating time.
S113, filling the quasi real-time data and the updating time into a configuration mapping table to obtain a real-time data table, and sending the real-time data table to the KUDU database.
The configuration mapping table includes a plurality of key value pairs, and further, information in the configuration mapping table includes the first preset tag and the second preset tag.
The database operation log can be analyzed in real time through the message middleware, and then the quasi-real-time data can be obtained in real time, so that the quasi-real-time data is stored in a specified KUDU database, and the storage efficiency is improved.
S12, acquiring a pre-created HIVE database, and creating an HIVE external table based on the HIVE database, wherein the HIVE internal table in the HIVE database stores the data before the updating time.
In at least one embodiment of the invention, the data stored in the HIVE database does not contain the data stored by the path indicated by the HIVE external table.
S13, determining a storage path of the quasi real-time data in the KUDU database, and storing the storage path into the HIVE external table.
In at least one embodiment of the present invention, the storage path refers to a direct path storing the near real-time data.
In at least one embodiment of the present invention, the determining, by the server, a storage path of the near real-time data in the KUDU database, and storing the storage path in the HIVE external table includes:
traversing the KUDU database by using the first preset tag;
when the first preset label is traversed in the KUDU database, determining a path traversed to the first preset label as the storage path;
and generating a path link according to the storage path, and writing the path link into the HIVE external table.
Through the first preset label corresponding to the quasi-real-time data, a storage path for storing the quasi-real-time data can be accurately determined, and then a path link is generated for the storage path, so that convenience is provided for subsequent data query.
S14, when a data query request is received, determining query termination time according to the data query request.
In at least one embodiment of the invention, the data query request may be triggered by any user. The query termination time refers to a maximum query time in the data query request.
In at least one embodiment of the present invention, the server determining the query termination time according to the data query request includes:
analyzing the message information of the data query request to obtain the data information carried by the message information;
extracting query time from the data information;
and determining the time with the largest value in the query time as the query termination time.
According to the embodiment, the whole data query request does not need to be analyzed, so that the analysis efficiency of the data query request can be improved, the query time can be quickly acquired, and in addition, the query starting time and the query ending time can be accurately determined through the method.
And S15, when the query termination time is detected to be the updating time, combining the HIVE external table and the HIVE internal table to obtain a target data table.
In at least one embodiment of the present invention, when the query termination time is the update time, that is, the complete data in the data query request cannot be queried in the HIVE database.
And the historical data stored in the HIVE database and the quasi-real-time data stored in the KUDU database are fused in the target data table.
In at least one embodiment of the present invention, the obtaining, by the server in association with the live external table and the live internal table, the target data table includes:
determining a first data volume in the storage path indicated by the HIVE external table, and determining a second data volume of the HIVE internal table;
when the first data volume is larger than the second data volume, acquiring historical data of the HIVE internal table, and writing the historical data into the real-time data table to obtain the target data table; or
And when the first data volume is smaller than or equal to the second data volume, writing the quasi real-time data into the HIVE internal table to obtain the target data table.
And comparing the determined first data quantity with the second data quantity to determine the generation mode of the target data table, so that the generation efficiency of the target data table can be improved in a proper mode.
And S16, querying the target data table by using the data query request to obtain response data of the data query request.
It is emphasized that, in order to further ensure the privacy and security of the response data, the response data may also be stored in a node of a block chain.
In at least one embodiment of the present invention, the response data refers to data obtained after responding to the data query request.
In at least one embodiment of the present invention, the server querying the target data table by using the data query request to obtain response data of the data query request includes:
determining the time with the minimum value in the query time as the query starting time;
acquiring a first time which is greater than the query starting time from the target data table, and acquiring a second time which is less than the query starting time from the target data table;
determining the intersection of the first time and the second time to obtain target time;
and acquiring data corresponding to the target time from the target data table as the response data.
By determining the target time, the response data can be quickly acquired from the target data table, the query efficiency of the response data is improved, and in addition, the quasi-real-time data updated in real time can be queried through the target data table.
Further, referring to fig. 3, fig. 3 is a flowchart of another embodiment of the quasi-real-time data processing method according to the present invention. The embodiment is obtained by improving the quasi-real-time data processing method shown in fig. 1 to fig. 2, as shown in fig. 3, after the response data of the data query request is obtained in step S16 shown in fig. 1, the method may further include the following steps:
and S20, acquiring the request number of the data query request.
And the server acquires the request number from the data information.
And S21, generating prompt information according to the request number and the response data.
The prompt information is used for prompting the generation of the response data.
And S22, encrypting the prompt message by adopting a symmetric encryption technology to obtain a ciphertext.
And S23, sending the ciphertext to the terminal equipment of the appointed contact person.
Wherein the designated contact may be a user who triggered the data query request.
The terminal device can be a mobile phone, a tablet, a computer and the like of the designated contact person.
Through the implementation mode, the prompt information can be generated in time, and then the appointed contact person can be reminded of receiving the response data in time.
It can be seen from the above technical solutions that, in the present invention, the message middleware connected to the server is determined, so that the message middleware can successfully analyze the database operation log, the message middleware can analyze the database operation log in real time, and further, the near real-time data can be obtained in real time, so that the near real-time data is stored in a designated database, thereby improving storage efficiency, the near real-time data is directly stored in the KUDU database, thereby narrowing the determination range of the storage path, so that the storage path can be quickly determined, since the HIVE external table stores the storage path of the near real-time data, the target data table contains the near real-time data, and the near real-time data updated in real time can be queried by performing data query on the target data table, the query performance is improved. According to the invention, the KUDU database is used for storing the real-time updated quasi-real-time data, and the HIVE internal table in the HIVE database is used for storing the data before the updating time, so that the performance of a large data platform corresponding to the HIVE database can be ensured without updating the latest quasi-real-time data into the HIVE database in real time, and meanwhile, the quasi-real-time data can still be inquired when the quasi-real-time data is not updated into the HIVE database, and the inquiry performance is improved.
FIG. 4 is a functional block diagram of a quasi-real-time data processing apparatus according to a preferred embodiment of the present invention. The near real-time data processing apparatus 11 includes a determining unit 110, a consuming unit 111, a creating unit 112, a storing unit 113, a combining unit 114, a querying unit 115, an obtaining unit 116, a generating unit 117, an encrypting unit 118, and a sending unit 119. The module/unit referred to herein is a series of computer readable instruction segments that can be accessed by the processor 13 and perform a fixed function and that are stored in the memory 12. In the present embodiment, the functions of the modules/units will be described in detail in the following embodiments.
When it is detected that the database operation log is generated, the determining unit 110 determines a message middleware connected to the server according to the database operation log, and synchronizes the database operation log to the message middleware.
In at least one embodiment of the present invention, the database operation log refers to a log recording database operations, where the database operations include data update, data deletion, data addition, and the like.
Further, the message middleware may be a Kafka server.
In at least one embodiment of the present invention, the determining unit 110 determines the message middleware connected to the server according to the database operation log, including:
determining the log type of the database operation log;
acquiring a log request corresponding to the log type, and acquiring a middleware connected with the server;
analyzing the log request by using the middleware to obtain an analysis result;
determining the middleware which indicates the successful analysis of the log request by the analysis result as target middleware, and determining the analysis efficiency of the target middleware;
and determining the target middleware with the highest parsing efficiency as the message middleware.
By the embodiment, the middleware connected with the server can be acquired, and further, by using the log request corresponding to the log type, the target middleware capable of successfully analyzing the request of the log type can be determined, and further, the message middleware with the highest analysis effect can be determined from the target middleware, so that the analysis efficiency of the database operation log can be improved.
In at least one embodiment of the present invention, when the message middleware is determined, the determination unit 110 sends the database operation log to the message middleware so as to parse the database operation log.
The consumption unit 111 consumes and analyzes the database operation log in the message middleware to obtain quasi-real-time data and update time of the quasi-real-time data, and stores the quasi-real-time data in a KUDU database.
In at least one embodiment of the present invention, the consumption unit 111 may obtain the operation indicated in the database operation log by pulling the database operation log from the message middleware for consumption, and further analyze the database operation log according to the indicated operation.
In at least one embodiment of the present invention, the update time refers to an update time of the quasi real-time data in the database corresponding to the database operation log.
The KUDU database refers to a relational database, each table in the KUDU database is composed of a plurality of fields, and each table must specify a primary key composed of at least one field.
In at least one embodiment of the present invention, the consuming unit 111 consumes and analyzes the database operation log in the message middleware to obtain near real-time data and update time of the near real-time data, and storing the near real-time data in a KUDU database includes:
and acquiring a configuration label library from the message middleware, and acquiring a first preset label and a second preset label from the configuration label library, wherein the first preset label is used for indicating data, and the second preset label is used for indicating time.
Wherein, a plurality of predefined tags are stored in the configuration tag library.
And extracting information corresponding to the first preset label from the database operation log as target data, and converting the data format of the target data into a preset format to obtain the quasi-real-time data.
The preset format refers to a data format required by a user, and the preset format may be freely set by the user, for example, the preset format may be num (K, Y), where K is an integer smaller than Y, and for example, if K is 1, Y may be 8. The preset format may also be a set format, etc.
And extracting information corresponding to the second preset label from the database operation log as the updating time.
And filling the quasi-real-time data and the updating time into a configuration mapping table to obtain a real-time data table, and sending the real-time data table to the KUDU database.
The configuration mapping table includes a plurality of key value pairs, and further, information in the configuration mapping table includes the first preset tag and the second preset tag.
The database operation log can be analyzed in real time through the message middleware, and then the quasi-real-time data can be obtained in real time, so that the quasi-real-time data is stored in a specified KUDU database, and the storage efficiency is improved.
The creating unit 112 obtains a pre-created live database, and creates a live external table based on the live database, wherein the live internal table in the live database stores data before the update time.
In at least one embodiment of the invention, the data stored in the HIVE database does not contain the data stored by the path indicated by the HIVE external table.
The storage unit 113 determines a storage path of the near real-time data in the KUDU database, and stores the storage path into the HIVE external table.
In at least one embodiment of the present invention, the storage path refers to a direct path storing the near real-time data.
In at least one embodiment of the present invention, the determining, by the storage unit 113, a storage path of the near real-time data in the KUDU database, and storing the storage path in the HIVE external table includes:
traversing the KUDU database by using the first preset tag;
when the first preset label is traversed in the KUDU database, determining a path traversed to the first preset label as the storage path;
and generating a path link according to the storage path, and writing the path link into the HIVE external table.
Through the first preset label corresponding to the quasi-real-time data, a storage path for storing the quasi-real-time data can be accurately determined, and then a path link is generated for the storage path, so that convenience is provided for subsequent data query.
When a data query request is received, the determining unit 110 determines a query termination time according to the data query request.
In at least one embodiment of the invention, the data query request may be triggered by any user. The query termination time refers to a maximum query time in the data query request.
In at least one embodiment of the present invention, the determining unit 110 determines the query termination time according to the data query request, including:
analyzing the message information of the data query request to obtain the data information carried by the message information;
extracting query time from the data information;
and determining the time with the largest value in the query time as the query termination time.
According to the embodiment, the whole data query request does not need to be analyzed, so that the analysis efficiency of the data query request can be improved, the query time can be quickly acquired, and in addition, the query starting time and the query ending time can be accurately determined through the method.
When detecting that the query termination time is the update time, the combining unit 114 combines the HIVE external table and the HIVE internal table to obtain a target data table.
In at least one embodiment of the present invention, when the query termination time is the update time, that is, the complete data in the data query request cannot be queried in the HIVE database.
And the historical data stored in the HIVE database and the quasi-real-time data stored in the KUDU database are fused in the target data table.
In at least one embodiment of the present invention, the combining unit 114 combines the live external table and the live internal table to obtain the target data table, including:
determining a first data volume in the storage path indicated by the HIVE external table, and determining a second data volume of the HIVE internal table;
when the first data volume is larger than the second data volume, acquiring historical data of the HIVE internal table, and writing the historical data into the real-time data table to obtain the target data table; or
And when the first data volume is smaller than or equal to the second data volume, writing the quasi real-time data into the HIVE internal table to obtain the target data table.
And comparing the determined first data quantity with the second data quantity to determine the generation mode of the target data table, so that the generation efficiency of the target data table can be improved in a proper mode.
The query unit 115 queries the target data table by using the data query request to obtain response data of the data query request.
It is emphasized that, in order to further ensure the privacy and security of the response data, the response data may also be stored in a node of a block chain.
In at least one embodiment of the present invention, the response data refers to data obtained after responding to the data query request.
In at least one embodiment of the present invention, the querying unit 115 queries the target data table by using the data query request, and obtaining response data of the data query request includes:
determining the time with the minimum value in the query time as the query starting time;
acquiring a first time which is greater than the query starting time from the target data table, and acquiring a second time which is less than the query starting time from the target data table;
determining the intersection of the first time and the second time to obtain target time;
and acquiring data corresponding to the target time from the target data table as the response data.
By determining the target time, the response data can be quickly acquired from the target data table, the query efficiency of the response data is improved, and in addition, the quasi-real-time data updated in real time can be queried through the target data table.
In at least one embodiment of the present invention, after obtaining the response data of the data query request, the obtaining unit 116 obtains the request number of the data query request.
And the server acquires the request number from the data information.
The generating unit 117 generates presentation information from the request number and the response data.
The prompt information is used for prompting the generation of the response data.
The encryption unit 118 encrypts the hint information using symmetric encryption techniques to obtain a ciphertext.
The sending unit 119 sends the ciphertext to the terminal device of the designated contact.
Wherein the designated contact may be a user who triggered the data query request.
The terminal device can be a mobile phone, a tablet, a computer and the like of the designated contact person.
Through the implementation mode, the prompt information can be generated in time, and then the appointed contact person can be reminded of receiving the response data in time.
It can be seen from the above technical solutions that, in the present invention, the message middleware connected to the server is determined, so that the message middleware can successfully analyze the database operation log, the message middleware can analyze the database operation log in real time, and further, the near real-time data can be obtained in real time, so that the near real-time data is stored in a designated database, thereby improving storage efficiency, the near real-time data is directly stored in the KUDU database, thereby narrowing the determination range of the storage path, so that the storage path can be quickly determined, since the HIVE external table stores the storage path of the near real-time data, the target data table contains the near real-time data, and the near real-time data updated in real time can be queried by performing data query on the target data table, the query performance is improved. According to the invention, the KUDU database is used for storing the real-time updated quasi-real-time data, and the HIVE internal table in the HIVE database is used for storing the data before the updating time, so that the performance of a large data platform corresponding to the HIVE database can be ensured without updating the latest quasi-real-time data into the HIVE database in real time, and meanwhile, the quasi-real-time data can still be inquired when the quasi-real-time data is not updated into the HIVE database, and the inquiry performance is improved.
Fig. 5 is a schematic structural diagram of a server according to a preferred embodiment of the present invention for implementing a near-real-time data processing method.
In one embodiment of the present invention, the server 1 includes, but is not limited to, a memory 12, a processor 13, and computer readable instructions, such as a near real-time data processing program, stored in the memory 12 and executable on the processor 13.
It will be appreciated by those skilled in the art that the schematic diagram is merely an example of the server 1 and does not constitute a limitation of the server 1 and may comprise more or less components than those shown, or some components in combination, or different components, e.g. the server 1 may further comprise input output devices, network access devices, buses, etc.
The Processor 13 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. The processor 13 is an operation core and a control center of the server 1, connects various parts of the entire server 1 by various interfaces and lines, and executes an operating system of the server 1 and various installed application programs, program codes, and the like.
Illustratively, the computer readable instructions may be partitioned into one or more modules/units that are stored in the memory 12 and executed by the processor 13 to implement the present invention. The one or more modules/units may be a series of computer-readable instruction segments capable of performing specific functions, which are used for describing the execution process of the computer-readable instructions in the server 1. For example, the computer readable instructions may be divided into a determination unit 110, a consumption unit 111, a creation unit 112, a storage unit 113, a joining unit 114, a query unit 115, an acquisition unit 116, a generation unit 117, an encryption unit 118, and a transmission unit 119.
The memory 12 may be used for storing the computer readable instructions and/or modules, and the processor 13 implements various functions of the server 1 by executing or executing the computer readable instructions and/or modules stored in the memory 12 and calling data stored in the memory 12. The memory 12 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data created according to the use of the server, and the like. The memory 12 may include non-volatile and volatile memories, such as: a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other storage device.
The memory 12 may be an external memory and/or an internal memory of the server 1. Further, the memory 12 may be a memory having a physical form, such as a memory stick, a TF Card (Trans-flash Card), or the like.
The modules/units integrated by the server 1 may be stored in a computer-readable storage medium if they are implemented in the form of software functional units and sold or used as separate products. Based on such understanding, all or part of the flow of the method according to the above embodiments may be implemented by hardware that is configured to be instructed by computer readable instructions, which may be stored in a computer readable storage medium, and when the computer readable instructions are executed by a processor, the steps of the method embodiments may be implemented.
Wherein the computer readable instructions comprise computer readable instruction code which may be in source code form, object code form, an executable file or some intermediate form, and the like. The computer-readable medium may include: any entity or device capable of carrying said computer readable instruction code, recording medium, U-disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM).
The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
With reference to fig. 1, the memory 12 in the server 1 stores computer-readable instructions to implement a near real-time data processing method, and the processor 13 can execute the computer-readable instructions to implement:
when the generation of a database operation log is detected, determining a message middleware connected with the server according to the database operation log, and synchronizing the database operation log to the message middleware;
consuming and analyzing the database operation log in the message middleware to obtain quasi-real-time data and the updating time of the quasi-real-time data, and storing the quasi-real-time data into a KUDU database;
acquiring a pre-created HIVE database, and creating an HIVE external table based on the HIVE database, wherein an HIVE internal table in the HIVE database stores data before the updating time;
determining a storage path of the quasi real-time data in the KUDU database, and storing the storage path into the HIVE external table;
when a data query request is received, determining query termination time according to the data query request;
when the query termination time is detected to be the updating time, combining the HIVE external table and the HIVE internal table to obtain a target data table;
and querying the target data table by using the data query request to obtain response data of the data query request.
Specifically, the processor 13 may refer to the description of the relevant steps in the embodiment corresponding to fig. 1 for a specific implementation method of the computer readable instructions, which is not described herein again.
In the embodiments provided in the present invention, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.
The computer readable storage medium has computer readable instructions stored thereon, wherein the computer readable instructions when executed by the processor 13 are configured to implement the steps of:
when the generation of a database operation log is detected, determining a message middleware connected with the server according to the database operation log, and synchronizing the database operation log to the message middleware;
consuming and analyzing the database operation log in the message middleware to obtain quasi-real-time data and the updating time of the quasi-real-time data, and storing the quasi-real-time data into a KUDU database;
acquiring a pre-created HIVE database, and creating an HIVE external table based on the HIVE database, wherein an HIVE internal table in the HIVE database stores data before the updating time;
determining a storage path of the quasi real-time data in the KUDU database, and storing the storage path into the HIVE external table;
when a data query request is received, determining query termination time according to the data query request;
when the query termination time is detected to be the updating time, combining the HIVE external table and the HIVE internal table to obtain a target data table;
and querying the target data table by using the data query request to obtain response data of the data query request.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.
The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.
Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. The plurality of units or devices may also be implemented by one unit or device through software or hardware. The terms first, second, etc. are used to denote names, but not any particular order.
Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims (10)

1. A quasi-real-time data processing method is applied to a server, and is characterized by comprising the following steps:
when the generation of a database operation log is detected, determining a message middleware connected with the server according to the database operation log, and synchronizing the database operation log to the message middleware;
consuming and analyzing the database operation log in the message middleware to obtain quasi-real-time data and the updating time of the quasi-real-time data, and storing the quasi-real-time data into a KUDU database;
acquiring a pre-created HIVE database, and creating an HIVE external table based on the HIVE database, wherein an HIVE internal table in the HIVE database stores data before the updating time;
determining a storage path of the quasi real-time data in the KUDU database, and storing the storage path into the HIVE external table;
when a data query request is received, determining query termination time according to the data query request;
when the query termination time is detected to be the updating time, combining the HIVE external table and the HIVE internal table to obtain a target data table;
and querying the target data table by using the data query request to obtain response data of the data query request.
2. The method of near real-time data processing according to claim 1, wherein the determining the message middleware connected to the server according to the database operation log comprises:
determining the log type of the database operation log;
acquiring a log request corresponding to the log type, and acquiring a middleware connected with the server;
analyzing the log request by using the middleware to obtain an analysis result;
determining the middleware which indicates the successful analysis of the log request by the analysis result as target middleware, and determining the analysis efficiency of the target middleware;
and determining the target middleware with the highest parsing efficiency as the message middleware.
3. The method of claim 1, wherein the consuming and parsing the database operation log in the message middleware to obtain near real-time data and an update time of the near real-time data, and storing the near real-time data in a KUDU database comprises:
acquiring a configuration label library from the message middleware, and acquiring a first preset label and a second preset label from the configuration label library, wherein the first preset label is used for indicating data, and the second preset label is used for indicating time;
extracting information corresponding to the first preset label from the database operation log as target data, and converting the data format of the target data into a preset format to obtain the quasi-real-time data;
extracting information corresponding to the second preset label from the database operation log as the updating time;
and filling the quasi-real-time data and the updating time into a configuration mapping table to obtain a real-time data table, and sending the real-time data table to the KUDU database.
4. The near real-time data processing method of claim 3, wherein the determining a storage path of the near real-time data in the KUDU database and storing the storage path into the HIVE external table comprises:
traversing the KUDU database by using the first preset tag;
when the first preset label is traversed in the KUDU database, determining a path traversed to the first preset label as the storage path;
and generating a path link according to the storage path, and writing the path link into the HIVE external table.
5. The method of claim 3, wherein the combining the external HIVE table with the internal HIVE table to obtain the target data table comprises:
determining a first data volume in the storage path indicated by the HIVE external table, and determining a second data volume of the HIVE internal table;
when the first data volume is larger than the second data volume, acquiring historical data of the HIVE internal table, and writing the historical data into the real-time data table to obtain the target data table; or
And when the first data volume is smaller than or equal to the second data volume, writing the quasi real-time data into the HIVE internal table to obtain the target data table.
6. The near real-time data processing method of claim 1, wherein the determining a query termination time based on the data query request comprises:
analyzing the message information of the data query request to obtain the data information carried by the message information;
extracting query time from the data information;
and determining the time with the largest value in the query time as the query termination time.
7. The quasi real-time data processing method of claim 6, wherein the querying the target data table with the data query request to obtain response data of the data query request comprises:
determining the time with the minimum value in the query time as the query starting time;
acquiring a first time which is greater than the query starting time from the target data table, and acquiring a second time which is less than the query starting time from the target data table;
determining the intersection of the first time and the second time to obtain target time;
and acquiring data corresponding to the target time from the target data table as the response data.
8. A near real-time data processing apparatus operating in a server, the near real-time data processing apparatus comprising:
the determining unit is used for determining the message middleware connected with the server according to the database operation log when the generation of the database operation log is detected, and synchronizing the database operation log to the message middleware;
the consumption unit is used for consuming and analyzing the database operation log in the message middleware to obtain quasi-real-time data and the updating time of the quasi-real-time data, and storing the quasi-real-time data into a KUDU database;
the device comprises a creating unit, a sending unit and a updating unit, wherein the creating unit is used for acquiring a pre-created HIVE database and creating an HIVE external table based on the HIVE database, and the HIVE internal table in the HIVE database stores data before the updating time;
the storage unit is used for determining a storage path of the quasi real-time data in the KUDU database and storing the storage path into the HIVE external table;
the determining unit is further configured to determine query termination time according to the data query request when the data query request is received;
the combining unit is used for combining the HIVE external table and the HIVE internal table to obtain a target data table when the query termination time is detected to be the updating time;
and the query unit is used for querying the target data table by using the data query request to obtain response data of the data query request.
9. A server, characterized in that the server comprises:
a memory storing computer readable instructions; and
a processor executing computer readable instructions stored in the memory to implement the method of processing near real-time data as claimed in any one of claims 1 to 7.
10. A computer-readable storage medium characterized by: the computer-readable storage medium stores therein computer-readable instructions which are executed by a processor in a server to implement the quasi real-time data processing method of any one of claims 1 to 7.
CN202011343327.0A 2020-11-26 2020-11-26 Quasi-real-time data processing method, device, server and storage medium Pending CN112434062A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011343327.0A CN112434062A (en) 2020-11-26 2020-11-26 Quasi-real-time data processing method, device, server and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011343327.0A CN112434062A (en) 2020-11-26 2020-11-26 Quasi-real-time data processing method, device, server and storage medium

Publications (1)

Publication Number Publication Date
CN112434062A true CN112434062A (en) 2021-03-02

Family

ID=74698242

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011343327.0A Pending CN112434062A (en) 2020-11-26 2020-11-26 Quasi-real-time data processing method, device, server and storage medium

Country Status (1)

Country Link
CN (1) CN112434062A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113553327A (en) * 2021-07-06 2021-10-26 杭州网易云音乐科技有限公司 Data processing method and device, medium and computing equipment
CN113806363A (en) * 2021-08-24 2021-12-17 北京偶数科技有限公司 Data processing method, device and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2767913A1 (en) * 2013-02-13 2014-08-20 Facebook, Inc. Hive table links
CN107943979A (en) * 2017-11-29 2018-04-20 山东鲁能软件技术有限公司 The quasi real time synchronous method and device of data between a kind of database
CN109284334A (en) * 2018-09-05 2019-01-29 拉扎斯网络科技(上海)有限公司 Real-time data base synchronous method, device, electronic equipment and storage medium
CN109447485A (en) * 2018-10-31 2019-03-08 北京百分点信息科技有限公司 A kind of rule-based Real-time Decision System and method
CN109977135A (en) * 2019-03-28 2019-07-05 北京奇艺世纪科技有限公司 A kind of data query method, apparatus and server
CN111414416A (en) * 2020-02-28 2020-07-14 平安科技(深圳)有限公司 Data processing method, device, equipment and storage medium
CN111694840A (en) * 2020-04-29 2020-09-22 平安科技(深圳)有限公司 Data synchronization method, device, server and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2767913A1 (en) * 2013-02-13 2014-08-20 Facebook, Inc. Hive table links
CN107943979A (en) * 2017-11-29 2018-04-20 山东鲁能软件技术有限公司 The quasi real time synchronous method and device of data between a kind of database
CN109284334A (en) * 2018-09-05 2019-01-29 拉扎斯网络科技(上海)有限公司 Real-time data base synchronous method, device, electronic equipment and storage medium
CN109447485A (en) * 2018-10-31 2019-03-08 北京百分点信息科技有限公司 A kind of rule-based Real-time Decision System and method
CN109977135A (en) * 2019-03-28 2019-07-05 北京奇艺世纪科技有限公司 A kind of data query method, apparatus and server
CN111414416A (en) * 2020-02-28 2020-07-14 平安科技(深圳)有限公司 Data processing method, device, equipment and storage medium
CN111694840A (en) * 2020-04-29 2020-09-22 平安科技(深圳)有限公司 Data synchronization method, device, server and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113553327A (en) * 2021-07-06 2021-10-26 杭州网易云音乐科技有限公司 Data processing method and device, medium and computing equipment
CN113806363A (en) * 2021-08-24 2021-12-17 北京偶数科技有限公司 Data processing method, device and storage medium

Similar Documents

Publication Publication Date Title
CN111694840B (en) Data synchronization method, device, server and storage medium
US11281793B2 (en) User permission data query method and apparatus, electronic device and medium
US9454558B2 (en) Managing an index of a table of a database
US9495402B2 (en) Managing a table of a database
US20120054146A1 (en) Systems and methods for tracking and reporting provenance of data used in a massively distributed analytics cloud
CN110990473B (en) Tag data processing system and method
JP2017526253A (en) Method and system for facilitating terminal identifiers
CN111797351A (en) Page data management method and device, electronic equipment and medium
JP2021518021A (en) Data processing methods, equipment and computer readable storage media
CN105373541A (en) Processing method and system for data operation request of database
CN112163412B (en) Data verification method and device, electronic equipment and storage medium
CN111638908A (en) Interface document generation method and device, electronic equipment and medium
CN112667240A (en) Program code conversion method and related device
CN107423037B (en) Application program interface positioning method and device
CN112434062A (en) Quasi-real-time data processing method, device, server and storage medium
CN111814045A (en) Data query method and device, electronic equipment and storage medium
CN108255967B (en) Method and device for calling storage process, storage medium and terminal
CN107391528B (en) Front-end component dependent information searching method and equipment
WO2022057525A1 (en) Method and device for data retrieval, electronic device, and storage medium
CN114116108A (en) Dynamic rendering method, device, equipment and storage medium
CN112948418A (en) Dynamic query method, device, equipment and storage medium
CN112784566A (en) Document generation method, device, equipment and storage medium
CN112711398A (en) Method, device and equipment for generating buried point file and storage medium
CN111986771A (en) Medical prescription query method and device, electronic equipment and storage medium
CN116089535A (en) Data synchronization method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination