CN117194524A - Offline index data processing method, device, equipment and storage medium - Google Patents

Offline index data processing method, device, equipment and storage medium Download PDF

Info

Publication number
CN117194524A
CN117194524A CN202311238672.1A CN202311238672A CN117194524A CN 117194524 A CN117194524 A CN 117194524A CN 202311238672 A CN202311238672 A CN 202311238672A CN 117194524 A CN117194524 A CN 117194524A
Authority
CN
China
Prior art keywords
offline
index
data
processing configuration
piece
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311238672.1A
Other languages
Chinese (zh)
Inventor
梁永健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Construction Bank Corp
CCB Finetech Co Ltd
Original Assignee
China Construction Bank Corp
CCB Finetech Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Construction Bank Corp, CCB Finetech Co Ltd filed Critical China Construction Bank Corp
Priority to CN202311238672.1A priority Critical patent/CN117194524A/en
Publication of CN117194524A publication Critical patent/CN117194524A/en
Pending legal-status Critical Current

Links

Abstract

The application relates to the technical field of big data, and particularly discloses a method, a device, equipment and a storage medium for processing offline index data. The method comprises the following steps: acquiring offline data in at least one offline data source to be processed, wherein the offline data in the at least one offline data source is data stored in a partitioning mode according to time information, and each offline data source corresponds to one offline index; determining a processing configuration table of an offline index corresponding to the offline data, wherein the processing configuration table comprises at least one piece of processing configuration information, and each piece of processing configuration information comprises identifiers of the offline index corresponding to the offline data under different statistical dimensions and time windows; traversing each piece of processing configuration information in the processing configuration table, and determining an index value of an offline index corresponding to each piece of processing configuration information from the offline data; and storing the index value of the offline index corresponding to each piece of processing configuration information into a database. By adopting the method, the calculation pressure of the index data can be reduced.

Description

Offline index data processing method, device, equipment and storage medium
Technical Field
The present application relates to the field of big data technologies, and in particular, to a method, an apparatus, a device, and a storage medium for processing offline index data.
Background
With the development of information technology, big data processing technology is increasingly applied. Real-time computing and offline computing are two common big data computing scenarios. The real-time computing timeliness is strong, the data processing response is fast, the offline computing support data period is long, but the data computing time is long, and the timeliness is poor.
In the related art, in the index calculation scene of real-time response of big data, index statistical analysis of different dimensions and different time windows is involved, and because real-time calculation cannot support the rapid processing of long-period data, an incremental accumulated index calculation scheme is generally used for carrying out index statistical analysis.
However, in the scheme of incremental cumulative calculation of the index, although the problems of real-time calculation performance and response aging in a large data volume and high concurrency scene can be solved, the index needs to be accumulated with data of a corresponding period duration to obtain a final required index statistical analysis result, which results in a large calculation pressure of the index data.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a method, apparatus, device, and storage medium for processing offline index data, which can reduce the calculation pressure of the index data.
In a first aspect, the present application provides a method for processing offline index data. The method comprises the following steps:
acquiring offline data in at least one offline data source to be processed, wherein the offline data in the at least one offline data source is data stored in a partitioning mode according to time information, and each offline data source corresponds to one offline index;
determining a processing configuration table of an offline index corresponding to the offline data, wherein the processing configuration table comprises at least one piece of processing configuration information, and each piece of processing configuration information comprises identifications of the offline index corresponding to the offline data under different statistical dimensions and time windows;
traversing each piece of processing configuration information in the processing configuration table, and determining an index value of an offline index corresponding to each piece of processing configuration information from the offline data;
and storing the index value of the offline index corresponding to each piece of processing configuration information into a database.
In one embodiment, the determining, from the offline data, an index value of an offline index corresponding to each piece of processing configuration information includes:
determining a structured query instruction corresponding to the identifier of the offline index in each piece of processing configuration information, wherein the structured query instruction is used for calling a statistical algorithm corresponding to the identifier of the offline index;
And calling a statistical algorithm corresponding to the structured query instruction according to the statistical dimension and the time window corresponding to each piece of processing configuration information, and determining an index value of the offline index corresponding to each piece of processing configuration information from the offline data.
In one embodiment, the storing the index value of the offline index corresponding to each piece of processing configuration information in the database includes:
respectively generating intermediate index data corresponding to each piece of processing configuration information according to a preset index storage format, wherein each intermediate index data comprises index values of corresponding offline indexes;
and storing the intermediate index data corresponding to each piece of processing configuration information into a database.
In one embodiment, the intermediate index data further includes at least one of the following data: the dimension value corresponding to the offline index, the time window corresponding to the offline index, the identifier of the offline index and the query condition corresponding to the offline index.
In one embodiment, after the storing the index value of the offline index corresponding to each piece of processing configuration information in the database, the method further includes:
Receiving a query request sent by a terminal device, wherein the query request comprises an identifier of an offline index to be queried, a dimension value corresponding to the offline index to be queried and query time information;
acquiring intermediate index data corresponding to the offline index from the database according to the identification of the offline index to be queried, the dimension value corresponding to the offline index to be queried and the query time information;
fusing the intermediate index data corresponding to the offline index to be queried to obtain synthetic data corresponding to the offline index to be queried;
and feeding back the synthesized data corresponding to the offline index to be queried to the terminal equipment as a query result.
In one embodiment, the determining the processing configuration table of the offline indicator corresponding to the offline data includes:
and determining a processing configuration table of the offline index corresponding to the offline data according to the processing type of the offline index, wherein the processing type of the offline index is divided according to a statistical algorithm corresponding to the offline index.
In one embodiment, the processing types include a combined processing type and an individual processing type, each piece of processing configuration information of the processing configuration table corresponding to the combined processing type includes an identifier of a plurality of offline indicators, and each piece of processing configuration information of the processing configuration table corresponding to the individual processing type includes an identifier of an offline indicator.
In a second aspect, the application further provides a device for processing the offline index data. The device comprises:
the acquisition module is used for acquiring offline data in offline data sources to be processed, wherein the offline data in the offline data sources are data stored in a partitioning mode according to time information, and each offline data source corresponds to one offline index;
the determining module is used for determining a processing configuration table of the offline index corresponding to the offline data, wherein the processing configuration table comprises at least one piece of processing configuration information, and each piece of processing configuration information comprises the identification of the offline index corresponding to the offline data under different statistical dimensions and time windows; traversing each piece of processing configuration information in the processing configuration table, and determining an index value of an offline index corresponding to each piece of processing configuration information from the offline data;
and the storage module is used for storing the index value of the offline index corresponding to each piece of processing configuration information into a database.
In one embodiment, the determining module is specifically configured to determine a structured query instruction corresponding to an identifier of an offline indicator in each piece of processing configuration information, where the structured query instruction is configured to invoke a statistical algorithm corresponding to the identifier of the offline indicator; and calling a statistical algorithm corresponding to the structured query instruction according to the statistical dimension and the time window corresponding to each piece of processing configuration information, and determining an index value of the offline index corresponding to each piece of processing configuration information from the offline data.
In one embodiment, the storage module is specifically configured to generate, according to a preset index storage format, intermediate index data corresponding to each piece of processing configuration information, where each intermediate index data includes an index value of a corresponding offline index; and storing the intermediate index data corresponding to each piece of processing configuration information into a database.
In one embodiment, the intermediate index data further includes at least one of the following data: the dimension value corresponding to the offline index, the time window corresponding to the offline index, the identifier of the offline index and the query condition corresponding to the offline index.
In one embodiment, the processing device of the offline indicator data further includes:
the query module is used for receiving a query request sent by the terminal equipment, wherein the query request comprises an identifier of an offline index to be queried, a dimension value corresponding to the offline index to be queried and query time information; acquiring intermediate index data corresponding to the offline index from the database according to the identification of the offline index to be queried, the dimension value corresponding to the offline index to be queried and the query time information; fusing the intermediate index data corresponding to the offline index to be queried to obtain synthetic data corresponding to the offline index to be queried; and feeding back the synthesized data corresponding to the offline index to be queried to the terminal equipment as a query result.
In one embodiment, the determining module is specifically configured to determine, according to a processing type of the offline indicator, a processing configuration table of the offline indicator corresponding to the offline data, where the processing type of the offline indicator is divided according to a statistical algorithm corresponding to the offline indicator.
In one embodiment, the processing types include a combined processing type and an individual processing type, each piece of processing configuration information of the processing configuration table corresponding to the combined processing type includes an identifier of a plurality of offline indicators, and each piece of processing configuration information of the processing configuration table corresponding to the individual processing type includes an identifier of an offline indicator.
In a third aspect, the present application also provides a computer device. The computer device includes a memory storing a computer program and a processor executing the method for processing offline indicator data according to the first aspect.
In a fourth aspect, the present application also provides a computer-readable storage medium. The computer readable storage medium has stored thereon a computer program that is executed by a processor to perform the method for processing offline indicator data according to the first aspect.
In a fifth aspect, the present application also provides a computer program product. The computer program product comprises a computer program which is executed by a processor to perform the method for processing offline indicator data according to the first aspect.
The method, the device, the equipment and the storage medium for processing the offline index data firstly acquire the offline data in at least one offline data source to be processed, wherein the offline data in the at least one offline data source is the data stored in a partitioning mode according to the time information, and each offline data source corresponds to one offline index. And secondly, determining a processing configuration table of the offline index corresponding to the offline data, wherein the processing configuration table comprises at least one piece of processing configuration information, and each piece of processing configuration information comprises the identification of the offline index corresponding to the offline data under different statistical dimensions and time windows. And traversing each piece of processing configuration information in the processing configuration table, and determining an index value of an offline index corresponding to each piece of processing configuration information from the offline data. And finally, storing the index value of the offline index corresponding to each piece of processing configuration information into a database. According to the processing method of the offline index data, index values of the offline index under different statistical dimensions and time windows are calculated, and the index values of the offline index under the different statistical dimensions and time windows are stored as intermediate states, so that index fusion is conveniently carried out based on the different statistical dimensions and time windows when needed, frequent calculation of the index data is avoided, and calculation pressure of the index data is greatly reduced.
Drawings
FIG. 1 is an application environment diagram of a method for processing offline index data according to an embodiment of the present application;
FIG. 2 is a flowchart of a method for processing offline index data according to an embodiment of the present application;
FIG. 3 is a flowchart illustrating another method for processing offline index data according to an embodiment of the present application;
FIG. 4 is a flowchart illustrating a method for processing offline index data according to an embodiment of the present application;
FIG. 5 is a flowchart illustrating a method for processing offline index data according to an embodiment of the present application;
FIG. 6 is a block diagram of an offline index data processing device according to an embodiment of the present application;
fig. 7 is an internal structure diagram of a computer device according to an embodiment of the present application.
Detailed Description
The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
The method for processing the offline index data provided by the embodiment of the application can be applied to an application environment shown in fig. 1. Wherein the terminal 102 communicates with the server 104 via a network. The data storage system may store data that the server 104 needs to process. The data storage system may be integrated on the server 104 or may be located on a cloud or other network server. When the offline index needs to be processed, the server 104 may obtain offline data in at least one offline data source to be processed, where the offline data in the at least one offline data source is data stored in a partition according to time information, and each offline data source corresponds to one offline index. Next, the server 104 determines a processing configuration table of offline indicators corresponding to the offline data, where the processing configuration table includes at least one piece of processing configuration information, and each piece of processing configuration information includes an identifier of the offline indicator corresponding to the offline data under different statistical dimensions and time windows. Again, the server 104 traverses each piece of processing configuration information in the processing configuration table, and determines an index value of the offline index corresponding to each piece of processing configuration information from the offline data. Finally, the server 104 stores the index value of the offline index corresponding to each piece of processing configuration information into the database. When the terminal device needs the index value of the offline index, the server 104 obtains the intermediate index data from the database according to the query request, and fuses the intermediate index data into the composite data corresponding to the offline index, so as to perform real-time index calculation.
The terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, internet of things devices, and portable wearable devices, where the internet of things devices may be smart speakers, smart televisions, smart air conditioners, smart vehicle devices, and the like. The portable wearable device may be a smart watch, smart bracelet, headset, or the like. The server 104 may be implemented as a stand-alone server or as a server cluster of multiple servers.
In one embodiment, as shown in fig. 2, a method for processing offline index data is provided, and the method is applied to the server in fig. 1 for illustration, and includes S201-S204:
s201, acquiring offline data in at least one offline data source to be processed, wherein the offline data in the at least one offline data source are data stored in a partitioned mode according to time information.
In the application, when the offline indexes are required to be counted, the server can firstly acquire the offline data in at least one offline data source to be processed.
It should be understood that the embodiment of the present application does not limit the offline data sources, and in some embodiments, the offline data sources may be stored in partitions according to time information, where the time information may be time slicing of data collection, for example, offline data of different offline data sources may be stored in partitions according to days, offline data of different offline data sources may be stored in cycles, and offline data of different offline data sources may be stored in months.
In some embodiments, different offline data sources may be provided with different identifiers, where the identifiers may be field names of the offline data sources, and the meaning of the field names may be internal interfaces corresponding to the offline data sources, where each field name corresponds to a number of an internal interface.
Exemplary, table 1 is a schematic table of an offline data source according to an embodiment of the present application. As shown in table 1, each offline data source may be represented by a field name with a corresponding field description added. In table 1, 200 field names are defined, and in actual use, the field names corresponding to the offline data source can be added or deleted according to the requirement.
TABLE 1
Field name Field description
Transmission time (X year, month, day, time and minute) Data generation time (X year, month, day, time and minute)
F1 Offline dataSource field 1
F2 Offline data Source field 2
F3 Offline data Source field 3
F200 Offline data Source field 200
dt Date of data
In some embodiments, each offline data source corresponds to an offline indicator. When one offline data source corresponds to a plurality of offline indexes, the offline data source can be split and stored into a plurality of offline data sources, and each split offline data source corresponds to one offline index respectively.
Exemplary, table 2 is a split schematic of an offline data source according to an embodiment of the present application. As shown in table 2, when the service data source needs to correspond to a plurality of offline indexes, the service data source may be split into a login data source, a subscription data source, a transfer data source, a payment data source and a query data source, which are respectively stored, and the login data source, the subscription data source, the transfer data source, the payment data source and the query data source respectively correspond to one offline index.
TABLE 2
In the present application, the above offline data source is data after the near real-time or online data processing (Extract transform load, ETL), and if the offline data source is a data source of a day end interface, the offline data needs to be stored after being converted into an internal interface format by the ETL.
S202, determining a processing configuration table of an offline index corresponding to the offline data, wherein the processing configuration table comprises at least one piece of processing configuration information, and each piece of processing configuration information comprises identifiers of the offline index corresponding to the offline data under different statistical dimensions and time windows.
In this step, after the server obtains the offline data in the at least one offline data source to be processed, a processing configuration table of the offline index corresponding to the offline data may be determined.
It should be appreciated that the statistical dimensions and time windows described above are used to filter offline data. The time window is used for filtering off-line data to be counted through time, and illustratively, off-line data of each year, each month or each day can be filtered out, the statistical dimension is used for filtering off-line data to be counted through the target object, and illustratively, off-line data of the target user or the target account can be filtered out.
In the present application, the process of filtering the offline data under the statistical dimension and the time window is equivalent to dividing the offline data based on the statistical dimension and the time window by the processing configuration table of the offline index, so that the index value of the divided offline data can be calculated.
It should be understood that, in the embodiment of the present application, there is no limitation on how to determine the processing configuration table of the offline indicator corresponding to the offline data, and in some embodiments, the server may determine the processing configuration table of the offline indicator corresponding to the offline data according to the processing type of the offline indicator, where the processing type of the offline indicator is divided according to the statistical algorithm corresponding to the offline indicator.
The processing types comprise a combined processing type and an independent processing type, each piece of processing configuration information of the processing configuration table corresponding to the combined processing type comprises a plurality of identifiers of offline indexes, and each piece of processing configuration information of the processing configuration table corresponding to the independent processing type comprises an identifier of one offline index.
It should be understood that the embodiments of the present application are not limited to how to divide the combined process type and the individual process type, and may be divided by the received configuration indication information.
Illustratively, the summing algorithm, the maximum algorithm, the minimum algorithm, the counting algorithm, etc. are simple algorithms that can be divided into merging processing types by configuring the indication information.
Exemplary, table 3 is a processing configuration table for merging processing types provided in an embodiment of the present application. As shown in table 3, under a statistical dimension and a time window, a piece of processing configuration information includes a plurality of offline index identifiers, and corresponding offline data can be filtered based on the statistical dimension and the time window in the processing configuration information, so that based on the plurality of offline index identifiers in the processing configuration information, index values of the plurality of offline indexes are determined from the filtered corresponding offline data.
TABLE 3 Table 3
Statistical dimension Time window Off-line index identification
User A 20230501 2,34,56
User A 20230502 2,34
Account number 1 20230503 17,18
For example, for algorithms with loads such as a number of times algorithm for solving the same content, the algorithms are inconvenient to combine, and can be divided into separate processing types by configuration indication information.
Illustratively, table 4 is a processing configuration table for a single process type provided in an embodiment of the present application. As shown in table 4, under a statistical dimension and a time window, a piece of processing configuration information includes an identifier of an offline indicator, and corresponding offline data can be filtered based on the statistical dimension and the time window in the processing configuration information, so that an index value of the offline indicator is determined from the filtered corresponding offline data based on the identifier of the offline indicator in the processing configuration information.
TABLE 4 Table 4
Statistical dimension Time window Off-line index identification
User A 20230501 4
User A 20230502 4
Account number 1 20230503 8
S203, traversing each piece of processing configuration information in the processing configuration table, and determining an index value of the offline index corresponding to each piece of processing configuration information from the offline data.
In this step, after the server determines the processing configuration table of the offline indicator corresponding to the offline data, each piece of processing configuration information in the processing configuration table may be traversed, and the index value of the offline indicator corresponding to each piece of processing configuration information may be determined from the offline data.
It should be understood that, in the embodiment of the present application, how to determine, from the offline data, the index value of the offline indicator corresponding to each piece of processing configuration information is not limited, and in some embodiments, the server may first determine a structured query instruction corresponding to the identifier of the offline indicator in each piece of processing configuration information, where the structured query instruction is used to invoke a statistical algorithm corresponding to the identifier of the offline indicator. And then, the server can call a statistical algorithm corresponding to the structured query instruction according to the statistical dimension and the time window corresponding to each piece of processing configuration information, and determine the index value of the offline index corresponding to each piece of processing configuration information from the offline data.
The structured query instruction may be a Structured Query (SQL) instruction generated by an SQL grammar. The SQL instruction may invoke a statistical algorithm through an SQL syntax.
It should be appreciated that the statistical algorithm described above is a statistical algorithm that supports offline batches and may include both general statistical algorithms and non-general statistical algorithms. Aiming at a summation algorithm, a maximum value algorithm, a minimum value algorithm, a counting algorithm and the like, the calculation architecture method of the real-time index can be directly converted into a general statistical algorithm supporting offline batch. For averaging algorithms, variance algorithms, etc., non-generic statistical algorithms suitable for offline batches may be generated.
In some embodiments, the above-mentioned statistical algorithm may be generated and stored in advance by the server, and after receiving the structured query instruction, the corresponding statistical algorithm may be invoked to calculate the corresponding index value based on the indication information in the structured query instruction.
For example, the server may traverse each piece of processing configuration information, filtering out corresponding offline data through the statistical dimension and time window. Referring to the first piece of processing configuration information in the processing configuration table in table 3, if the statistical dimension is user a and the time window is 20230501, offline data related to user a on day 1 and 5 of 2023 is filtered, and then, based on the identifier 4 of the offline index in the first piece of processing configuration information, a corresponding structured query instruction is determined. And finally, determining the index value of the offline index from the filtered offline data through a statistical algorithm called by the structured query instruction.
It should be understood that if the processing configuration table is of a merging processing type, after filtering off the offline data through the statistical dimension and the time window, each piece of processing configuration information may determine a structured query instruction corresponding to the identifier of the plurality of offline indexes, and determine the index values of the plurality of offline indexes from the offline data through the plurality of structured query instructions, respectively.
S204, storing the index value of the offline index corresponding to each piece of processing configuration information into a database.
In this step, after the server determines the index value of the offline index corresponding to each piece of processing configuration information from the offline data, the index value of the offline index corresponding to each piece of processing configuration information may be stored in the database.
It should be understood that, in the embodiment of the present application, there is no limitation on how to store the index value of the offline index corresponding to each piece of processing configuration information in the database, and in some embodiments, the server may generate, according to a preset index storage format, intermediate index data corresponding to each piece of processing configuration information, where each intermediate index data includes the index value of the corresponding offline index. The server may then store the intermediate index data corresponding to each piece of processing configuration information in a database.
Wherein, the intermediate index data also comprises at least one of the following data: the method comprises the steps of dimension values corresponding to offline indexes, time windows corresponding to the offline indexes, identification of the offline indexes and query conditions corresponding to the offline indexes.
In some embodiments, the intermediate index data may be stored in an index storage format corresponding to a data warehouse (hive) table.
Exemplary, table 5 is a table of an index storage format provided in an embodiment of the present application. As shown in Table 5, the intermediate metrics data stored in the metrics storage format may include a dimension value (KEY), a time window, an identification of offline metrics, metrics results, a bucket value, and so forth.
TABLE 5
Wherein the dimension value is used for representing the statistical dimension of the offline index, for example The user's number, user's identification, account name, access address, etc. may be included. The time window is a time window corresponding to the statistical index value, and can be divided according to the granularity of day, week, month, etc. The offline indicator may be, for example, ID1, ID5, or may be a partition of the hive table, so as to facilitate the insertion and deletion of the indicator. The index result may be a corresponding index. The bucket values may be query conditions corresponding to offline indicators, for example, 0-99 may be determined as the bucket values according to keys, so that the data may be used as the query conditions when the data is exported.
In some embodiments, after the index value of the offline index corresponding to each piece of processing configuration information is stored in the database, the server may receive a query request sent by the terminal device, where the query request includes an identifier of the offline index to be queried, a dimension value corresponding to the offline index to be queried, and query time information. And then, the server acquires intermediate index data corresponding to the offline index from the database according to the identification of the offline index to be queried, the dimension value corresponding to the offline index to be queried and the query time information, and fuses the intermediate index data corresponding to the queried offline index to obtain composite data corresponding to the offline index to be queried. And finally, the server feeds back the synthesized data corresponding to the offline index to be queried to the terminal equipment as a query result.
According to the application, the offline data and the offline index are split through the time slicing technology, so that the offline index becomes a lightweight real-time index, the offline index does not need to be frequently calculated, and the offline statistics pressure is greatly reduced. And the calculation of the offline index value is split into a plurality of tasks through processing the configuration information in the configuration table, only one index value corresponding to the statistical dimension and the time window is calculated each time, and the calculation pressure of the offline index value can be reduced through time exchange of the space.
According to the method for processing the offline index data, the offline data in at least one offline data source to be processed is firstly obtained, the offline data in the at least one offline data source are stored in a partitioning mode according to time information, and each offline data source corresponds to one offline index. And secondly, determining a processing configuration table of the offline index corresponding to the offline data, wherein the processing configuration table comprises at least one piece of processing configuration information, and each piece of processing configuration information comprises the identification of the offline index corresponding to the offline data under different statistical dimensions and time windows. And traversing each piece of processing configuration information in the processing configuration table, and determining an index value of the offline index corresponding to each piece of processing configuration information from the offline data. And finally, storing the index value of the offline index corresponding to each piece of processing configuration information into a database. According to the processing method of the offline index data, index values of the offline index under different statistical dimensions and time windows are calculated, and the index values of the offline index under the different statistical dimensions and time windows are stored as intermediate states, so that index fusion is conveniently carried out based on the different statistical dimensions and time windows when needed, frequent calculation of the index data is avoided, and calculation pressure of the index data is greatly reduced.
How to query the offline indicator is described below. Fig. 3 is a flow chart of another method for processing offline index data according to an embodiment of the present application, as shown in fig. 3, the method for processing offline index data includes S301-S304:
s301, receiving a query request sent by the terminal equipment.
The query request comprises an identifier of an offline index to be queried, a dimension value corresponding to the offline index to be queried and query time information.
In the application, when the offline index value needs to be inquired, an inquiry request can be sent to the server, so that the server obtains the composite data corresponding to the offline index by using the index values of the offline index corresponding to different processing configuration information.
S302, acquiring intermediate index data corresponding to the offline index from a database according to the identification of the offline index to be queried, the dimension value corresponding to the offline index to be queried and the query time information.
S303, fusing the intermediate index data corresponding to the queried offline indexes to obtain the synthesized data corresponding to the offline indexes to be queried.
In some embodiments, the fusing of the intermediate index data may be fusing the real-time index data with the intermediate index data according to a fusion type.
It should be understood that the embodiments of the present application are not limited in how data fusion is performed, and in some embodiments, different fusion methods may be used according to different fusion types.
The fusion type comprises three types of independent variables which are independent of time, independent variables which are dependent on time and have one dependent variable. The type of time-independent argument may be, for example, the total amount of consumption, the type of time-dependent argument may be, for example, the total number of times in a certain period of time, the type of time-dependent argument with a dependent variable may be, for example, the same number of times as the current value.
For example, if the fusion type of the offline index value is independent of time, it may be determined that the accumulated consumption amount N1 from the current time point T1 in the corresponding statistical dimension is used as the real-time index data, and then, all the intermediate index data Nt of the consumption finance before the real-time index data N1 and T1 are accumulated, so as to obtain the consumption amount.
For example, if the fusion type of the offline index value is an independent variable related to time, such as a total number of strokes in three months, the accumulated value X1 of the current month (M1) accumulated from the current time T1 may be determined by real-time calculation under the corresponding statistical dimension. Accordingly, the intermediate index data needs to calculate Nt of the previous month M1, and the accumulated value X2 of the previous month M2, and the accumulated value X3 of the previous month M3. Then the current month value is x1+x2+x3, and if a new value Xx is stored in real time across months, the current value is xx+x1+x2.
For example, if the fusion type of offline index values is time dependent and has an independent variable of a dependent variable, for example, the same number of times as the current value, one such result (Y1, number of times, Y2, number of times) or (Y1, Y2, Y3, Y1, Y2) may be saved. From the dimension of the storage space, the first scheme is better than the second scheme when a certain value of Y appears very frequently, otherwise we can design a generic move most times or most recently Yx in the first place based on frequency or most recently appearing time to implement a sort, then we can design a elimination mechanism to protect themselves, e.g. we define n1= (Y1, times, Y2, times).
S304, the synthesized data corresponding to the offline index to be queried is fed back to the terminal equipment as a query result.
An index value of the offline index corresponding to each piece of processing configuration information is determined as follows. Fig. 4 is a flowchart of another method for processing offline index data according to an embodiment of the present application, as shown in fig. 4, the method for processing offline index data includes S401-S405:
s401, obtaining offline data in at least one offline data source to be processed, wherein the offline data in the at least one offline data source are data stored in a partitioned mode according to time information, and each offline data source corresponds to one offline index.
S402, determining a processing configuration table of an offline index corresponding to the offline data, wherein the processing configuration table comprises at least one piece of processing configuration information, and each piece of processing configuration information comprises identifiers of the offline index corresponding to the offline data under different statistical dimensions and time windows.
S403, traversing each piece of processing configuration information in the processing configuration table, and determining a structured query instruction corresponding to the identifier of the offline indicator in each piece of processing configuration information, wherein the structured query instruction is used for calling a statistical algorithm corresponding to the identifier of the offline indicator.
S404, according to the statistical dimension and the time window corresponding to each piece of processing configuration information, a statistical algorithm corresponding to the structured query instruction is called, and an index value of the offline index corresponding to each piece of processing configuration information is determined from the offline data.
S405, storing index values of the offline indexes corresponding to each piece of processing configuration information into a database.
Fig. 5 is a flowchart of another method for processing offline index data according to an embodiment of the present application, as shown in fig. 5, the method for processing offline index data includes S501-S510:
s501, obtaining offline data in at least one offline data source to be processed, wherein the offline data in the at least one offline data source are data stored in a partitioned mode according to time information, and each offline data source corresponds to one offline index.
S502, determining a processing configuration table of the offline index corresponding to the offline data according to the processing type of the offline index, wherein the processing type of the offline index is divided according to a statistical algorithm corresponding to the offline index.
The processing configuration table comprises at least one piece of processing configuration information, and each piece of processing configuration information comprises the identification of the offline index corresponding to the offline data under different statistical dimensions and time windows.
The processing types comprise a combined processing type and an independent processing type, each piece of processing configuration information of the processing configuration table corresponding to the combined processing type comprises a plurality of identifiers of offline indexes, and each piece of processing configuration information of the processing configuration table corresponding to the independent processing type comprises an identifier of one offline index.
S503, traversing each piece of processing configuration information in the processing configuration table, and determining a structured query instruction corresponding to the identifier of the offline indicator in each piece of processing configuration information, wherein the structured query instruction is used for calling a statistical algorithm corresponding to the identifier of the offline indicator.
S504, according to the statistical dimension and the time window corresponding to each piece of processing configuration information, a statistical algorithm corresponding to the structured query instruction is called, and an index value of the offline index corresponding to each piece of processing configuration information is determined from the offline data.
S505, respectively generating intermediate index data corresponding to each piece of processing configuration information according to a preset index storage format, wherein each intermediate index data comprises index values of corresponding offline indexes.
Wherein, the intermediate index data also comprises at least one of the following data: the method comprises the steps of dimension values corresponding to offline indexes, time windows corresponding to the offline indexes, identification of the offline indexes and query conditions corresponding to the offline indexes.
S506, storing the intermediate index data corresponding to each piece of processing configuration information into a database.
S507, receiving a query request sent by the terminal equipment, wherein the query request comprises an identifier of an offline index to be queried, a dimension value corresponding to the offline index to be queried and query time information.
S508, obtaining intermediate index data corresponding to the offline index from the database according to the identification of the offline index to be queried, the dimension value corresponding to the offline index to be queried and the query time information.
S509, fusing the intermediate index data corresponding to the queried offline indexes to obtain the synthetic data corresponding to the offline indexes to be queried.
S510, feeding the synthesized data corresponding to the offline index to be queried back to the terminal equipment as a query result.
According to the method for processing the offline index data, the offline data in at least one offline data source to be processed is firstly obtained, the offline data in the at least one offline data source are stored in a partitioning mode according to time information, and each offline data source corresponds to one offline index. And secondly, determining a processing configuration table of the offline index corresponding to the offline data, wherein the processing configuration table comprises at least one piece of processing configuration information, and each piece of processing configuration information comprises the identification of the offline index corresponding to the offline data under different statistical dimensions and time windows. And traversing each piece of processing configuration information in the processing configuration table, and determining an index value of the offline index corresponding to each piece of processing configuration information from the offline data. And finally, storing the index value of the offline index corresponding to each piece of processing configuration information into a database. According to the processing method of the offline index data, index values of the offline index under different statistical dimensions and time windows are calculated, and the index values of the offline index under the different statistical dimensions and time windows are stored as intermediate states, so that index fusion is conveniently carried out based on the different statistical dimensions and time windows when needed, frequent calculation of the index data is avoided, and calculation pressure of the index data is greatly reduced.
It should be understood that, although the steps in the flowcharts related to the above embodiments are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.
Based on the same inventive concept, the embodiment of the application also provides an offline index data processing device for realizing the offline index data processing method. The implementation of the solution provided by the device is similar to the implementation described in the above method, so the specific limitation in the embodiments of the device for processing one or more offline index data provided below may refer to the limitation of the method for processing offline index data hereinabove, and will not be repeated herein.
In one embodiment, as shown in fig. 6, there is provided a processing apparatus 600 for offline indicator data, including: an acquisition module 601, a determination module 602, a storage module 603, and a query module 604, wherein:
the obtaining module 601 is configured to obtain offline data in offline data sources to be processed, where the offline data in the offline data sources is data stored in a partition according to time information, and each offline data source corresponds to one offline index;
the determining module 602 is configured to determine a processing configuration table of offline indicators corresponding to the offline data, where the processing configuration table includes at least one piece of processing configuration information, and each piece of processing configuration information includes an identifier of the offline indicator corresponding to the offline data under different statistical dimensions and time windows; traversing each piece of processing configuration information in the processing configuration table, and determining an index value of an offline index corresponding to each piece of processing configuration information from the offline data;
the storage module 603 is configured to store an index value of the offline index corresponding to each piece of processing configuration information into the database.
In one embodiment, the determining module 602 is specifically configured to determine a structured query instruction corresponding to an identifier of an offline indicator in each piece of processing configuration information, where the structured query instruction is configured to invoke a statistical algorithm corresponding to the identifier of the offline indicator; and calling a statistical algorithm corresponding to the structured query instruction according to the statistical dimension and the time window corresponding to each piece of processing configuration information, and determining an index value of the offline index corresponding to each piece of processing configuration information from the offline data.
In one embodiment, the storage module 603 is specifically configured to generate, according to a preset index storage format, intermediate index data corresponding to each piece of processing configuration information, where each intermediate index data includes an index value of a corresponding offline index; and storing the intermediate index data corresponding to each piece of processing configuration information into a database.
In one embodiment, the intermediate index data further includes at least one of the following data: the method comprises the steps of dimension values corresponding to offline indexes, time windows corresponding to the offline indexes, identification of the offline indexes and query conditions corresponding to the offline indexes.
In one embodiment, the processing apparatus 600 for offline indicator data further includes:
the query module 604 is configured to receive a query request sent by a terminal device, where the query request includes an identifier of an offline indicator to be queried, a dimension value corresponding to the offline indicator to be queried, and query time information; acquiring intermediate index data corresponding to the offline index from a database according to the identification of the offline index to be queried, the dimension value corresponding to the offline index to be queried and the query time information; fusing the intermediate index data corresponding to the inquired offline indexes to obtain the synthetic data corresponding to the offline indexes to be inquired; and feeding the synthesized data corresponding to the offline index to be queried back to the terminal equipment as a query result.
In one embodiment, the determining module 602 is specifically configured to determine, according to a processing type of the offline indicator, a processing configuration table of the offline indicator corresponding to the offline data, where the processing type of the offline indicator is divided according to a statistical algorithm corresponding to the offline indicator.
In one embodiment, the processing types include a merged processing type and an independent processing type, each piece of processing configuration information of the processing configuration table corresponding to the merged processing type includes an identifier of a plurality of offline indexes, and each piece of processing configuration information of the processing configuration table corresponding to the independent processing type includes an identifier of one offline index.
The above-mentioned each module in the processing device of the off-line index data may be implemented in whole or in part by software, hardware, and a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 7. The computer device includes a processor, a memory, an Input/Output interface (I/O) and a communication interface. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface is connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is for storing data. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for communicating with an external terminal through a network connection. The computer program, when executed by a processor, implements a method of processing offline indicator data.
It will be appreciated by those skilled in the art that the structure shown in FIG. 7 is merely a block diagram of some of the structures associated with the present inventive arrangements and is not limiting of the computer device to which the present inventive arrangements may be applied, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.
In one embodiment, a computer device is provided, including a memory and a processor, where the memory stores a computer program, and the processor implements the method for processing offline indicator data described above when executing the computer program.
In one embodiment, a computer readable storage medium is provided, on which a computer program is stored, which when executed by a processor implements the method of processing offline indicator data described above.
In one embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the method of processing offline indicator data described above.
It should be noted that, the user information (including, but not limited to, user equipment information, user personal information, etc.) and the data (including, but not limited to, data for analysis, stored data, presented data, etc.) related to the present application are information and data authorized by the user or sufficiently authorized by each party, and the collection, use, and processing of the related data need to comply with the related regulations.
Those skilled in the art will appreciate that implementing all or part of the above-described methods in accordance with the embodiments may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the embodiments provided herein may include at least one of a relational database and a non-relational database. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processor referred to in the embodiments provided in the present application may be a general-purpose processor, a central processing unit, a graphics processor, a digital signal processor, a programmable logic unit, a data processing logic unit based on quantum computing, or the like, but is not limited thereto.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The foregoing examples illustrate only a few embodiments of the application and are described in detail herein without thereby limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of the application should be assessed as that of the appended claims.

Claims (11)

1. A method for processing offline index data, the method comprising:
acquiring offline data in at least one offline data source to be processed, wherein the offline data in the at least one offline data source is data stored in a partitioning mode according to time information, and each offline data source corresponds to one offline index;
determining a processing configuration table of an offline index corresponding to the offline data, wherein the processing configuration table comprises at least one piece of processing configuration information, and each piece of processing configuration information comprises identifications of the offline index corresponding to the offline data under different statistical dimensions and time windows;
Traversing each piece of processing configuration information in the processing configuration table, and determining an index value of an offline index corresponding to each piece of processing configuration information from the offline data;
and storing the index value of the offline index corresponding to each piece of processing configuration information into a database.
2. The method of claim 1, wherein determining an index value of an offline index corresponding to each piece of processing configuration information from the offline data comprises:
determining a structured query instruction corresponding to the identifier of the offline index in each piece of processing configuration information, wherein the structured query instruction is used for calling a statistical algorithm corresponding to the identifier of the offline index;
and calling a statistical algorithm corresponding to the structured query instruction according to the statistical dimension and the time window corresponding to each piece of processing configuration information, and determining an index value of the offline index corresponding to each piece of processing configuration information from the offline data.
3. The method according to claim 1, wherein storing the index value of the offline index corresponding to each piece of processing configuration information in the database includes:
respectively generating intermediate index data corresponding to each piece of processing configuration information according to a preset index storage format, wherein each intermediate index data comprises index values of corresponding offline indexes;
And storing the intermediate index data corresponding to each piece of processing configuration information into a database.
4. A method according to claim 3, wherein the intermediate index data further comprises at least one of the following data: the dimension value corresponding to the offline index, the time window corresponding to the offline index, the identifier of the offline index and the query condition corresponding to the offline index.
5. The method of claim 4, wherein after storing the index value of the offline index corresponding to each piece of processing configuration information in the database, the method further comprises:
receiving a query request sent by a terminal device, wherein the query request comprises an identifier of an offline index to be queried, a dimension value corresponding to the offline index to be queried and query time information;
acquiring intermediate index data corresponding to the offline index from the database according to the identification of the offline index to be queried, the dimension value corresponding to the offline index to be queried and the query time information;
fusing the intermediate index data corresponding to the offline index to be queried to obtain synthetic data corresponding to the offline index to be queried;
And feeding back the synthesized data corresponding to the offline index to be queried to the terminal equipment as a query result.
6. The method of claim 1, wherein the determining the processing configuration table of the offline indicator corresponding to the offline data comprises:
and determining a processing configuration table of the offline index corresponding to the offline data according to the processing type of the offline index, wherein the processing type of the offline index is divided according to a statistical algorithm corresponding to the offline index.
7. The method of claim 6, wherein the process types include a merged process type and an individual process type, each piece of process configuration information of the process configuration table corresponding to the merged process type includes an identifier of a plurality of offline indicators, and each piece of process configuration information of the process configuration table corresponding to the individual process type includes an identifier of one offline indicator.
8. An apparatus for processing offline indicator data, the apparatus comprising:
the acquisition module is used for acquiring offline data in offline data sources to be processed, wherein the offline data in the offline data sources are data stored in a partitioning mode according to time information, and each offline data source corresponds to one offline index;
The determining module is used for determining a processing configuration table of the offline index corresponding to the offline data, wherein the processing configuration table comprises at least one piece of processing configuration information, and each piece of processing configuration information comprises the identification of the offline index corresponding to the offline data under different statistical dimensions and time windows; traversing each piece of processing configuration information in the processing configuration table, and determining an index value of an offline index corresponding to each piece of processing configuration information from the offline data;
and the storage module is used for storing the index value of the offline index corresponding to each piece of processing configuration information into a database.
9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 7 when the computer program is executed.
10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 7.
11. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 7.
CN202311238672.1A 2023-09-22 2023-09-22 Offline index data processing method, device, equipment and storage medium Pending CN117194524A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311238672.1A CN117194524A (en) 2023-09-22 2023-09-22 Offline index data processing method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311238672.1A CN117194524A (en) 2023-09-22 2023-09-22 Offline index data processing method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117194524A true CN117194524A (en) 2023-12-08

Family

ID=89001551

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311238672.1A Pending CN117194524A (en) 2023-09-22 2023-09-22 Offline index data processing method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117194524A (en)

Similar Documents

Publication Publication Date Title
KR102522274B1 (en) User grouping method, apparatus thereof, computer, computer-readable recording medium and computer program
CN105900093A (en) Keyvalue database data table updating method and data table updating device
CN110502543B (en) Equipment performance data storage method, device, equipment and storage medium
CN106326295B (en) Semantic data storage method and device
CN116401238A (en) Deviation monitoring method, apparatus, device, storage medium and program product
CN117194524A (en) Offline index data processing method, device, equipment and storage medium
CN116302867A (en) Behavior data analysis method, apparatus, computer device, medium, and program product
CN115658680A (en) Data storage method, data query method and related device
CN114443742A (en) K line graph display method, device and equipment
CN113778996A (en) Large data stream data processing method and device, electronic equipment and storage medium
US10558647B1 (en) High performance data aggregations
CN111506613A (en) Method, system, device and equipment for querying incidence relation of data record
CN115408396B (en) Method, device, computer equipment and storage medium for storing business data
CN114238258B (en) Database data processing method, device, computer equipment and storage medium
CN117454025A (en) Method, device, equipment and medium for determining paging display data of server
CN116800833A (en) Data pushing method, device, computer equipment and storage medium
CN117216164A (en) Financial data synchronous processing method, apparatus, device, medium and program product
CN117234562A (en) Configuration parameter updating method and device and computer equipment
CN117495518A (en) Method, device, equipment and storage medium for managing articles of bank point system
CN116521546A (en) Interface performance adjusting method and device, computer equipment and storage medium
CN116894061A (en) Page-based target operation statistics method and device and computer equipment
CN116506506A (en) Service dynamic change method, device, computer equipment and storage medium
CN117312445A (en) Data synchronization method, apparatus, computer device, storage medium, and program product
CN116450669A (en) Data query method, device, computer equipment and storage medium
CN117076721A (en) Data query method, device, equipment, medium and product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination