CN117472913A - Data processing method, device, equipment and storage medium - Google Patents

Data processing method, device, equipment and storage medium Download PDF

Info

Publication number
CN117472913A
CN117472913A CN202311674462.7A CN202311674462A CN117472913A CN 117472913 A CN117472913 A CN 117472913A CN 202311674462 A CN202311674462 A CN 202311674462A CN 117472913 A CN117472913 A CN 117472913A
Authority
CN
China
Prior art keywords
dimension table
external
query
data source
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311674462.7A
Other languages
Chinese (zh)
Inventor
丹晓东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Agricultural Bank of China
Original Assignee
Agricultural Bank of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Agricultural Bank of China filed Critical Agricultural Bank of China
Priority to CN202311674462.7A priority Critical patent/CN117472913A/en
Publication of CN117472913A publication Critical patent/CN117472913A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24552Database cache management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs
    • G06F16/24578Query processing with adaptation to user needs using ranking
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a data processing method, a device, equipment and a storage medium. The method comprises the following steps: responding to a query request of a fact table corresponding to a target dimension table, querying in a local cache queue, and determining whether the target dimension table exists; if not, determining whether the external query condition is met according to the target request quantity and the timer time of the target dimension table, and if so, generating a dimension table external query command; based on a preset distributed processing engine, according to an access strategy of an external data source, sending a dimension table external query command to the external data source so as to query a target dimension table. According to the technical scheme, the local storage and the external data source can be effectively utilized, the performance problem of the external data source existing in the association of the dimension table is improved, and the data processing efficiency is improved.

Description

Data processing method, device, equipment and storage medium
Technical Field
The present invention relates to the field of big data, and in particular, to a data processing method, apparatus, device, and storage medium.
Background
With the continuous development of big data, service scenes are increasingly complex, and the real-time requirements of service functions on processing calculation are also higher. Compared with the traditional batch processing, the stream processing is characterized by unbounded and real-time, does not need to execute operation on the whole data set, but executes operation on each data item transmitted through the system, and has great potential in improving the calculation timeliness.
One common scenario for each streaming data process is to perform data widening with respect to an external data source, the dimension expansion, the streaming data source being referred to as a fact table, and the associated external data source being referred to as a dimension table. However, in a scenario where the data volume of the fact table is huge and the timeliness requirement is high, the data processing efficiency of the external data source may be affected by factors such as self-storage design and high network overhead, so that the data processing efficiency is poor.
Therefore, how to effectively utilize the local storage and the external data source, improve the performance problem of the external data source existing in the association of the dimension table, and improve the data processing efficiency is a problem to be solved urgently at present.
Disclosure of Invention
The invention provides a data processing method, a device, equipment and a storage medium, which are used for effectively utilizing local storage and external data sources, improving the performance problem of the external data sources existing in the association of a dimension table and improving the data processing efficiency.
According to an aspect of the present invention, there is provided a data processing method including:
responding to a query request of a fact table corresponding to a target dimension table, querying in a local cache queue, and determining whether the target dimension table exists;
if not, determining whether the external query condition is met according to the target request quantity and the timer time of the target dimension table, and if so, generating a dimension table external query command;
based on a preset distributed processing engine, according to an access strategy of an external data source, sending a dimension table external query command to the external data source so as to query a target dimension table.
According to another aspect of the present invention, there is provided a data processing apparatus comprising:
the first determining module is used for responding to the query request of the fact table corresponding to the target dimension table, querying in a local cache queue and determining whether the target dimension table exists or not;
the second determining module is used for determining whether the external query condition is met according to the target request quantity of the target dimension table and the timer time if not, and generating a dimension table external query command if yes;
and the sending module is used for sending the external query command of the dimension table to the external data source according to the access strategy of the external data source based on the preset distributed processing engine so as to query the target dimension table.
According to another aspect of the present invention, there is provided an electronic apparatus including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the data processing method according to any one of the embodiments of the present invention.
According to another aspect of the present invention, there is provided a computer readable storage medium storing computer instructions for causing a processor to execute a data processing method according to any one of the embodiments of the present invention.
According to the technical scheme, in response to a query request of a fact table corresponding to a target dimension table, query is performed in a local cache queue, and whether the target dimension table exists is determined; if not, determining whether the external query condition is met according to the target request quantity and the timer time of the target dimension table, and if so, generating a dimension table external query command; based on a preset distributed processing engine, according to an access strategy of an external data source, sending a dimension table external query command to the external data source so as to query a target dimension table. By the method, the local storage and the external data source can be effectively utilized, the performance problem of the external data source existing in the association of the dimension table is improved, and the data processing efficiency is improved.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the invention or to delineate the scope of the invention. Other features of the present invention will become apparent from the description that follows.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a data processing method according to a first embodiment of the present invention;
FIG. 2A is a flow chart of a Redis-based Flink dimension table lookup provided by embodiment two of the present invention;
FIG. 2B is a flowchart of a local cache module according to a second embodiment of the present invention;
FIG. 2C is a flow chart of a second embodiment of the present invention for batch access to external data sources;
FIG. 2D is a flowchart of an asynchronous query module according to a second embodiment of the present invention;
FIG. 3 is a block diagram of a data processing apparatus according to a third embodiment of the present invention;
fig. 4 is a schematic structural diagram of an electronic device according to a fourth embodiment of the present invention.
Detailed Description
In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
It should be noted that the terms "first," "second," "target," "candidate," "alternative," and the like in the description and claims of the invention and in the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
It should be noted that, in the trend of increasingly complex traffic scenarios, the scale of streaming data is huge, the associated dimension table data may be tens of millions or even hundreds of millions, and the local cache can solve the throughput performance problem to a certain extent, but in the specific stream computing scenario, the dimension table data may be created continuously and rapidly in real time, the local cache may not play a role, and thus, the local cache fails, and further, the external data source is frequently accessed, causing huge pressure on the external data source, and finally, the cache breaks down. If the local cache is done in full, it would cause a huge consumption of memory, and is not a good solution as such. According to the technical scheme, the local storage and the external data source can be fully utilized to perform the dimension table query, so that the throughput performance of dimension table association is improved, and the specific implementation is described in detail in the following embodiments.
Example 1
FIG. 1 is a flow chart of a data processing method according to a first embodiment of the present invention; the method is applicable to the situation that the data query is carried out on the local data source and the external data source in response to the query request to determine the target dimension table, and the method can be executed by a data processing device which can be realized in a software and/or hardware mode and can be integrated in electronic equipment with a data processing function. As shown in fig. 1, the data processing method includes:
s101, responding to a query request of a fact table corresponding to a target dimension table, and querying in a local cache queue to determine whether the target dimension table exists.
One common scenario for streaming data (i.e., streaming data) processing is where data processing is performed by an external data source associated with the system. Apache Flink is a streaming computing framework and distributed processing engine in which the association of such streaming data to external data sources is Look up join. Streaming data sources are called fact tables, such as streaming real-time data sources which change continuously in a link, and generally exist in streaming data sources such as message queues, change logs and the like; the related external data sources are called dimension tables, and it is noted that other external data sources may need to be related to perform data widening and dimension expansion in the processing process of the streaming data, and the related external data sources are called dimension tables, so as to provide related queries for real-time calculation. The target dimension table refers to a dimension table associated with a fact table in the local cache. The local cache queue may be an LRU (Least Recently Used ) based queue.
It should be noted that, in the invention, the link dimension table association connector mainly realizes the connection of the data sources through the corresponding interfaces such as DynamicTableSourceFactory, lookupTableSource, asyncTableFunction and the like of the link.
Optionally, in response to a query request for the fact table corresponding to the target dimension table, an association key of the fact table and the target dimension table may be determined, and query is performed in a local cache queue according to the association key, to determine whether the local cache hits the association key, that is, to determine whether the target dimension table exists.
Optionally, if the target dimension table exists in the local cache queue, the query request may be directly responded according to the target dimension table obtained by querying in the local cache queue.
Optionally, if the target dimension table exists in the local cache queue, it may be a priori verified whether the target dimension table is expired, if not, the query request may be responded directly according to the target dimension table obtained by querying in the local cache queue, specifically, the query is performed in the local cache queue, and after determining whether the target dimension table exists, the method further includes: if the target dimension table exists in the local cache queue, determining whether the target dimension table in the local cache queue is invalid according to the preset dimension table expiration time; if not, responding to the query request according to the target dimension table, and updating the position of the target dimension table in the local cache queue to be the head position. The dimension table expiration Time refers To TTL (Time To Live ) expiration Time, i.e., a lookup.
Optionally, if the target dimension table in the local cache queue has failed, the response to the access request by using the target dimension table in the local cache queue is abandoned, and the subsequent steps S102-S103 are executed, that is, the response is performed by querying the external data source.
It should be noted that, the present invention can acquire data in the local memory, and the cache is firstly checked, and the cache does not need to query the external data source, so that the frequent access of the system to the external data source can be reduced, thereby improving the data processing performance.
S102, if not, determining whether an external query condition is met according to the target request quantity and the timer time of the target dimension table, and if so, generating a dimension table external query command.
The target request number refers to the number of target dimension tables of the request, the timer time refers to preset time for periodically and uniformly carrying out external query, and the timer time can be 10 seconds, for example. The external query condition may be that the current time reaches the timer time, or that the target request number of the target dimension table reaches a preset number threshold, and the current time reaches the timer time. The number of the external query commands of the dimension table can be one or at least two.
In order to prevent data from being output for a long time when the data amount is small, the external query condition is preferably: the timer per cycle is up and no matter whether the number of the association requests is reached, the link establishment is performed to inquire the external data source.
Optionally, determining whether the external query condition is satisfied according to the target request number and the timer time of the target dimension table includes: if the target request number of the target dimension table is larger than the preset number threshold and the current time reaches the preset timer time, the external query condition is determined to be met. The preset number threshold may be, for example, 30 pieces of data.
S103, based on a preset distributed processing engine, according to an access strategy of an external data source, sending a dimension table external query command to the external data source so as to query a target dimension table.
Optionally, the preset distributed processing engine is a link processing engine; the access policy to the external data source is either a synchronous access policy or an asynchronous access policy.
Optionally, if the access policy to the external data source is a synchronous access policy and the number of the external query commands in the dimension table is at least two, the commands may be packaged first and then sent to the external data source in a unified manner, specifically, the external query commands in the dimension table are sent to the external data source, including: ordering and packaging external query commands of the dimension table by utilizing the Pipeline function of Redis to generate packaging commands; and uniformly transmitting the packaging command to the external data source through network connection with the external data source. Wherein Redis (Remote Dictionary Server) refers to a remote dictionary server. The Pipeline is a workflow framework, and can realize the arrangement and visual operation of complex flows.
Optionally, if the access policy to the external data source is a synchronous access policy, the asynchronous thread pool may be used to implement sending a plurality of requests to the external data source continuously, and specifically, sending the external query command of the dimension table to the external data source according to the access policy to the external data source, including: determining an asynchronous query command with a corresponding access strategy being an asynchronous access strategy from the external query command of the dimension table to generate an asynchronous query request, and placing the asynchronous query request into a preset asynchronous queue; and calling a preset asynchronous thread pool to initiate an asynchronous request so as to send the external query command of the dimension table to an external data source.
Illustratively, multiple requests may be processed concurrently using the asynchronous interface Async I/O of the link, reducing blocking of threads.
Optionally, after the external query command of the dimension table is sent to an external data source to query the target dimension table, a feedback result of the request can be obtained through a preset callback method, and the feedback result is returned to the link frame to instruct the local link frame to take out the corresponding request from the queue for processing and then send the request to the downstream.
Optionally, after sending the external query command of the dimension table to the external data source to perform the target dimension table query, the method further includes: determining whether the local cache queue reaches a preset size, if so, deleting an original dimension table positioned at the tail of the local cache queue to obtain an updated cache queue; and inserting the target dimension table obtained by query into the head of the update cache queue to store the target dimension table in an external data source locally.
Optionally, if the local cache queue does not reach the preset size, the target dimension table of the external data source may be directly stored in the head of the local cache queue, so as to realize that the target dimension table of the external data source is stored locally.
According to the technical scheme, in response to a query request of a fact table corresponding to a target dimension table, query is performed in a local cache queue, and whether the target dimension table exists is determined; if not, determining whether the external query condition is met according to the target request quantity and the timer time of the target dimension table, and if so, generating a dimension table external query command; based on a preset distributed processing engine, according to an access strategy of an external data source, sending a dimension table external query command to the external data source so as to query a target dimension table. By the method, the local storage and the external data source can be effectively utilized, the performance problem of the external data source existing in the association of the dimension table is improved, and the data processing efficiency is improved.
Example two
FIG. 2A is a flow chart of a Redis-based Flink dimension table lookup provided by embodiment two of the present invention; FIG. 2B is a flowchart of a local cache module according to a second embodiment of the present invention; FIG. 2C is a flow chart of a second embodiment of the present invention for batch access to external data sources; FIG. 2D is a flowchart of an asynchronous query module according to a second embodiment of the present invention; based on the above embodiments, the present embodiment provides a local cache module, a batch access external data source module, and an asynchronous query model interaction, to perform a preferred instance of the Redis-based Flink dimension table (i.e., the target dimension table) query.
As shown in fig. 2A, the Redis-based flank dimension table query method may include the following procedures:
if a query request of the fact table management dimension table is detected, firstly querying a local cache module to determine whether the local cache hits an associated key, if so, directly returning associated data, if not, based on batch-access batch-collection logic of the external data source module, establishing connection with the external data source when the set parameters reach a quantity threshold or a time threshold, namely acquiring dimension table associated data in a batch mode.
If the batch collection logic achieves the batch triggering condition, an asynchronous access module is adopted to determine whether a user sets asynchronous access for each associated request, if asynchronous access is not set, synchronous access directly accesses an external data source to achieve synchronous acquisition of associated external data, if an asynchronous method is set, the associated request is added into an asynchronous queue, the external data source is asynchronously accessed, and finally the acquired associated data is processed through an asynchronous callback method, namely callback is waited for acquiring the associated data.
As shown in fig. 2B, the local cache is an LRU-based queue, the purpose of the cache is to reduce access improvement performance of the system to an external data source, directly obtain data in the local memory, first check the cache, and the cache does not query the external data source, and the local cache module may perform the following process:
if the query of the association dimension table is detected, determining whether a local cache queue has a corresponding key value, if so, reordering the local cache queue and returning dimension table association data, if the local cache queue does not have the corresponding key value, searching an external data source to acquire association data, determining whether the local cache queue is full, deleting the least frequently used cache according to LRU if full, inserting the association data acquired by the external data source into the cache queue if not full, reordering the local cache queue, and finally returning dimension table association data.
It should be noted that, the local cache module is initialized, mainly including two parameters, namely, a lookup.cache.max-rows cache queue size and a lookup.cache.ttl cache expiration time, and the local cache module is initialized according to user settings. When the data is not obtained from the cache, the data queried by an external data source is required to be updated into the cache, if the cache queue is not full, the data is inserted into the queue head, if the cache queue reaches the set size, the data at the tail of the queue is deleted according to the sequence of the queue, and new data is inserted into the queue head to reorder the queue. When the data is obtained from the cache, the data only needs to be moved to the head of the queue for reordering and the dimension table association data is returned.
Furthermore, in order to ensure that the data in the cache can be updated in time, TTL expiration time is required to be set, the data in the cache is forced to be invalid after being written into the cache, and external data query is forced to be carried out on the invalid data, so that the data accuracy is ensured. The dimension table information has different updating frequencies according to different scenes, such as order information and the like which can be created and changed in real time, and such as age and the like which can be updated at a lower speed, so that the size and expiration time of the buffer memory need to define related parameters according to the condition of the dimension table, and the buffer memory can be selected not to be started, so that balance is made between the accuracy of data and the throughput of the system.
As shown in fig. 2C, the main performance consumption of the fact table association dimension table data is on the link establishment with the external data source, and the network overhead consumption is more frequent in the large-flow fact table scenario, based on this scenario, the present invention proposes a batch access external data source module, which can overcome the shortcomings of the existing method of establishing an external data source connection once for each association request of the fact table data, and the network consumption is large, by utilizing the micro batch concept to save a batch of data, and waiting for a unified establishment of a connection once to request for obtaining data until reaching a threshold value, the network overhead is greatly reduced, the throughput is improved, and the method specifically comprises the following steps:
if the inquiry of the association dimension table is detected, whether the number of the requests to be associated reaches a threshold value or not and whether the timer time reaches the threshold value or not are determined, if the timer time does not reach the threshold value or the number of the requests to be associated does not reach the threshold value, the sending operation of the request command is returned, and if the timer time reaches the threshold value and the number of the requests to be associated also reaches the threshold value, the batch request command can be packaged and sent to an external data source through one-time network connection, and the information of the association dimension table is obtained in batches and a result is returned to the inquiry request.
It should be noted that the number of the substrates, the batch access external data source module can be initialized according to the set threshold value of the batch request quantity of the logo, the batch size and the threshold value of the batch request time of the logo, the batch size;
optionally, when the data request of the association dimension table comes, it can determine whether the condition reaches the quantity threshold (default 30 pieces of data), if the quantity is insufficient, the data is not continuously waiting for the batch, and if the quantity reaches the threshold, the connection of the external data source is directly performed to query the data.
Optionally, in order to prevent data from being output for a long time under the condition of small data volume, a timer (default 10 seconds) is set, and whether the number of association requests is reached or not, the link establishment query is performed when the timer of each period expires.
Alternatively, a set of commands may be packed using the Pipeline function of Redis while guaranteeing order of commands, then sent to Redis once through the network, and then the results of the execution are returned in bulk.
It should be noted that the invention improves the idea of accessing the external data source module in batches to reduce the network connection consumption, and still needs to adjust the corresponding parameters such as batch data size, timer time and the like for different stream computing services to adapt to own scenes, thereby reducing data time delay and improving link throughput.
As shown in FIG. 2D, in streaming computing, the dimension table association defaults to synchronous call access to external data sources, and the next request will not be processed until no result is returned by one request, resulting in a blocked Flink thread, affecting throughput. The asynchronous inquiry module provided by the invention utilizes an asynchronous client to process a plurality of requests concurrently through an asynchronous interface Async I/O of the Flink, thereby reducing the blocking of threads, and specifically comprises the following steps:
if the related dimension table query is detected, the request is put into an asynchronous queue, the asynchronous request is initiated by utilizing a thread pool, then a callback method is registered, when the callback method is successfully invoked, a result is returned to the Flink framework, and finally, the corresponding obtained request is taken out from the queue to be processed and then sent to the downstream.
Optionally, the request for starting the asynchronous query mode may be put into the asynchronous queue first, while the order of the request is guaranteed, after the asynchronous request is initiated, the asynchronous callback method is registered, further, the callback method processes the request that the asynchronous call is successful, and gives the return result to the link framework, finally, the associated request of the asynchronous queue may be taken out and marked as completed, and the taken-out message is sent to the downstream.
It should be noted that, the asynchronous access module of the present invention mainly uses the modules of the flexible Async I/O interface, the thread pool, etc. to implement asynchronous inquiry by self definition, and can process multiple requests and replies concurrently, and can continuously send multiple requests to external data sources, meanwhile, no blocking waiting is needed between the continuous requests, thus improving the throughput efficiency of the link.
According to the technical scheme, the invention, taking network overhead reduction as a starting point, designs the flank dimension table connector based on Redis by arranging the local cache module, the batch access external data source module and the asynchronous query module, and compared with other flank connectors, the invention has the advantages that under the scene of stream computing the fact table association dimension table, the association efficiency is greatly improved, the link throughput is obviously improved, and compared with the existing scheme, the invention has a great advantage in the processing speed of a large-flow real-time computing scene.
Example III
FIG. 3 is a block diagram of a data processing apparatus according to a third embodiment of the present invention; the data processing device provided by the embodiment of the invention can execute the data processing method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method; the data processing apparatus may be implemented in hardware and/or software and configured in a device having data processing functions.
As shown in fig. 3, the data processing apparatus specifically includes:
the first determining module 301 is configured to determine whether a target dimension table exists by querying in a local cache queue in response to a query request for a target dimension table corresponding to a fact table;
a second determining module 302, configured to determine whether an external query condition is satisfied according to the target request number and the timer time of the target dimension table if not, and if yes, generate a dimension table external query command;
the sending module 303 is configured to send a dimension table external query command to an external data source according to an access policy of the external data source based on a preset distributed processing engine, so as to perform a target dimension table query.
According to the technical scheme, in response to a query request of a fact table corresponding to a target dimension table, query is performed in a local cache queue, and whether the target dimension table exists is determined; if not, determining whether the external query condition is met according to the target request quantity and the timer time of the target dimension table, and if so, generating a dimension table external query command; based on a preset distributed processing engine, according to an access strategy of an external data source, sending a dimension table external query command to the external data source so as to query a target dimension table. By the method, the local storage and the external data source can be effectively utilized, the performance problem of the external data source existing in the association of the dimension table is improved, and the data processing efficiency is improved.
Further, the device is also used for:
if the target dimension table exists in the local cache queue, determining whether the target dimension table in the local cache queue is invalid according to the preset dimension table expiration time;
if not, responding to the query request according to the target dimension table, and updating the position of the target dimension table in the local cache queue to be the head position.
Further, the device is also used for:
determining whether the local cache queue reaches a preset size, if so, deleting an original dimension table positioned at the tail of the local cache queue to obtain an updated cache queue;
and inserting the target dimension table obtained by query into the head of the update cache queue to store the target dimension table in an external data source locally.
Further, the second determining module 302 is specifically configured to:
if the target request number of the target dimension table is larger than the preset number threshold and the current time reaches the preset timer time, the external query condition is determined to be met.
Further, the sending module 303 is specifically configured to:
ordering and packaging external query commands of the dimension table by utilizing the Pipeline function of Redis to generate packaging commands;
and uniformly transmitting the packaging command to the external data source through network connection with the external data source.
Further, the preset distributed processing engine is a link processing engine; the access policy of the external data source is a synchronous access policy or an asynchronous access policy.
Further, the sending module 303 is further configured to:
determining an asynchronous query command with a corresponding access strategy being an asynchronous access strategy from the external query command of the dimension table to generate an asynchronous query request, and placing the asynchronous query request into a preset asynchronous queue;
and calling a preset asynchronous thread pool to initiate an asynchronous request so as to send the external query command of the dimension table to an external data source.
Example IV
Fig. 4 is a schematic structural diagram of an electronic device according to a fourth embodiment of the present invention; fig. 4 shows a schematic diagram of the structure of an electronic device 10 that may be used to implement an embodiment of the invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic equipment may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.
As shown in fig. 4, the electronic device 10 includes at least one processor 11, and a memory, such as a Read Only Memory (ROM) 12, a Random Access Memory (RAM) 13, etc., communicatively connected to the at least one processor 11, in which the memory stores a computer program executable by the at least one processor, and the processor 11 may perform various appropriate actions and processes according to the computer program stored in the Read Only Memory (ROM) 12 or the computer program loaded from the storage unit 18 into the Random Access Memory (RAM) 13. In the RAM 13, various programs and data required for the operation of the electronic device 10 may also be stored. The processor 11, the ROM 12 and the RAM 13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to bus 14.
Various components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, etc.; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, an optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.
The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, digital Signal Processors (DSPs), and any suitable processor, controller, microcontroller, etc. The processor 11 performs the various methods and processes described above, such as data processing methods.
In some embodiments, the data processing method may be implemented as a computer program tangibly embodied on a computer-readable storage medium, such as the storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM 12 and/or the communication unit 19. One or more of the steps of the data processing method described above may be performed when the computer program is loaded into RAM 13 and executed by processor 11. Alternatively, in other embodiments, the processor 11 may be configured to perform the data processing method in any other suitable way (e.g. by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
A computer program for carrying out methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be implemented. The computer program may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. The computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) through which a user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.
The computing system may include clients and servers. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service are overcome.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present invention may be performed in parallel, sequentially, or in a different order, so long as the desired results of the technical solution of the present invention are achieved, and the present invention is not limited herein.
The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims (10)

1. A method of data processing, comprising:
responding to a query request of a fact table corresponding to a target dimension table, querying in a local cache queue, and determining whether the target dimension table exists;
if not, determining whether the external query condition is met according to the target request quantity and the timer time of the target dimension table, and if so, generating a dimension table external query command;
based on a preset distributed processing engine, according to an access strategy of an external data source, sending a dimension table external query command to the external data source so as to query a target dimension table.
2. The method of claim 1, wherein after querying in the local cache queue to determine whether the target dimension table exists, further comprising:
if the target dimension table exists in the local cache queue, determining whether the target dimension table in the local cache queue is invalid according to the preset dimension table expiration time;
if not, responding to the query request according to the target dimension table, and updating the position of the target dimension table in the local cache queue to be the head position.
3. The method of claim 1, wherein after sending the external query command of the dimension table to the external data source for the target dimension table query, further comprising:
determining whether the local cache queue reaches a preset size, if so, deleting an original dimension table positioned at the tail of the local cache queue to obtain an updated cache queue;
and inserting the target dimension table obtained by query into the head of the update cache queue to store the target dimension table in an external data source locally.
4. The method of claim 1, wherein determining whether the external query condition is satisfied based on the target number of requests and the timer time of the target dimension table comprises:
if the target request number of the target dimension table is larger than the preset number threshold and the current time reaches the preset timer time, the external query condition is determined to be met.
5. The method of claim 1, wherein sending the external query command of the dimension table to the external data source comprises:
ordering and packaging external query commands of the dimension table by utilizing the Pipeline function of Redis to generate packaging commands;
and uniformly transmitting the packaging command to the external data source through network connection with the external data source.
6. The method of claim 1, wherein the preset distributed processing engine is a Flink processing engine; the access policy of the external data source is a synchronous access policy or an asynchronous access policy.
7. The method of claim 1, wherein sending the dimension table external query command to the external data source according to the access policy to the external data source comprises:
determining an asynchronous query command with a corresponding access strategy being an asynchronous access strategy from the external query command of the dimension table to generate an asynchronous query request, and placing the asynchronous query request into a preset asynchronous queue;
and calling a preset asynchronous thread pool to initiate an asynchronous request so as to send the external query command of the dimension table to an external data source.
8. A data processing apparatus, comprising:
the first determining module is used for responding to the query request of the fact table corresponding to the target dimension table, querying in a local cache queue and determining whether the target dimension table exists or not;
the second determining module is used for determining whether the external query condition is met according to the target request quantity of the target dimension table and the timer time if not, and generating a dimension table external query command if yes;
and the sending module is used for sending the external query command of the dimension table to the external data source according to the access strategy of the external data source based on the preset distributed processing engine so as to query the target dimension table.
9. An electronic device, the electronic device comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores a computer program for execution by the at least one processor to enable the at least one processor to perform the data processing method of any one of claims 1-7.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores computer instructions for causing a processor to implement the data processing method of any one of claims 1-7 when executed.
CN202311674462.7A 2023-12-07 2023-12-07 Data processing method, device, equipment and storage medium Pending CN117472913A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311674462.7A CN117472913A (en) 2023-12-07 2023-12-07 Data processing method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311674462.7A CN117472913A (en) 2023-12-07 2023-12-07 Data processing method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117472913A true CN117472913A (en) 2024-01-30

Family

ID=89639763

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311674462.7A Pending CN117472913A (en) 2023-12-07 2023-12-07 Data processing method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117472913A (en)

Similar Documents

Publication Publication Date Title
CN112540806B (en) Method and device for rendering small program page, electronic equipment and storage medium
US12026541B2 (en) Method for applet page rendering, electronic device and storage medium
CN112883041B (en) Data updating method and device, electronic equipment and storage medium
US11847194B2 (en) Scheduling method and apparatus, device and storage medium
CN117082073A (en) File storage method, file downloading method, device, equipment and storage medium
CN116126916A (en) Data query method, device and equipment based on intelligent network card
CN116383207A (en) Data tag management method and device, electronic equipment and storage medium
CN117472913A (en) Data processing method, device, equipment and storage medium
CN115525666A (en) Real-time data updating method and device, electronic equipment and storage medium
CN114553894B (en) Data synchronization method, device, system and storage medium
CN116668464B (en) Multi-serial port synchronous processing method, device, electronic equipment and medium
CN116107763B (en) Data transmission method, device, equipment and storage medium
CN118055068A (en) Message processing method, device, equipment and medium based on DPDK
CN117707856A (en) Data backup method, device, equipment and storage medium
CN117667936A (en) Database processing method, device, equipment and storage medium
CN116339745A (en) Method, device, equipment and storage medium for analyzing communication data between equipment
CN117112601A (en) Database data compression method, device, equipment and storage medium
CN116486831A (en) Data processing method, device, electronic equipment and storage medium
CN116431928A (en) Time sequence data pushing method, device, equipment and storage medium
CN116112382A (en) Network data capturing method and device, electronic equipment and storage medium
CN118626233A (en) Multi-acceleration card multi-task scheduling method and medium based on distributed parallel big model
CN117573267A (en) Application program data display method, system, electronic equipment and storage medium
CN116541438A (en) Data query method, device, equipment and storage medium
CN118034605A (en) Data processing method and device, electronic equipment and storage medium
CN116881368A (en) Data synchronization method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination