CN117951175A

CN117951175A - Data processing method, device, electronic equipment and storage medium

Info

Publication number: CN117951175A
Application number: CN202311870584.3A
Authority: CN
Inventors: 黄波
Original assignee: Mashang Consumer Finance Co Ltd
Current assignee: Mashang Consumer Finance Co Ltd
Priority date: 2023-12-29
Filing date: 2023-12-29
Publication date: 2024-04-30

Abstract

The application provides a data processing method, a data processing device, electronic equipment and a storage medium; the method comprises the following steps: acquiring a historical query record of a target object aiming at a data set to be processed, wherein the historical query record comprises sub-query records corresponding to the target object at the historical query time; the sub-query records are used for recording the dimension of target data to which the data queried by the target object at the historical query time belong; performing dimension aggregation on target data dimensions in the history query record to obtain N target dimension groups, wherein N is an integer greater than 1; dividing data in the data set to be processed into N target dimension groups based on the N target dimension groups to obtain N target data sets corresponding to the N target dimension groups. The application can effectively divide the data.

Description

Data processing method, device, electronic equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a data processing method, a data processing device, an electronic device, and a storage medium.

Background

The online analytical processing (OLAP, online Analytical Processing) system is the most important application of the data warehouse system, is specially designed to support complex analysis operations, focuses on decision support of decision makers and high-level management staff, can rapidly and flexibly perform complex query processing of large data volume according to the requirements of the analysis staff, and provides query results for the decision makers so that the decision makers can accurately grasp the business conditions of enterprises, know the requirements of objects and formulate a correct scheme.

In the related art, for data division of a data set, the data division is generally performed directly on the data set according to data dimensions, so that the data set obtained by division cannot accurately reflect the historical query habit of a user, and the accuracy of data division is low.

Disclosure of Invention

The embodiment of the application provides a data processing method, a data processing device, electronic equipment and a storage medium, which can effectively improve the accuracy of data division.

The technical scheme of the embodiment of the application is realized as follows:

The embodiment of the application provides a data processing method, which comprises the following steps:

Acquiring a historical query record of a target object aiming at a data set to be processed, wherein the historical query record comprises sub-query records respectively corresponding to the target object at the time of the historical query;

the sub-query records are used for recording target data dimensions of the data queried by the target object at the historical query time;

Performing dimension aggregation on target data dimensions in the history query record to obtain N target dimension groups, wherein N is an integer greater than 1;

Dividing data in the data set to be processed into N target dimension groups based on the N target dimension groups to obtain N target data sets corresponding to the N target dimension groups, wherein one target dimension group corresponds to one target data set.

An embodiment of the present application provides a data processing apparatus, including:

The acquisition module is used for acquiring a historical query record of a target object aiming at a data set to be processed, wherein the historical query record comprises sub-query records corresponding to the target object at the historical query time; the sub-query records are used for recording target data dimensions of the data queried by the target object at the historical query time;

The dimension aggregation module is used for carrying out dimension aggregation on the target data dimensions in the history query record to obtain N target dimension groups, wherein N is an integer greater than 1;

The data dividing module is used for dividing data in the data set to be processed into the N target dimension groups based on the N target dimension groups to obtain N target data sets corresponding to the N target dimension groups, wherein one target dimension group corresponds to one target data set.

An embodiment of the present application provides an electronic device, including:

a memory for storing computer executable instructions or computer programs;

And the processor is used for realizing the data processing method provided by the embodiment of the application when executing the computer executable instructions or the computer programs stored in the memory.

The embodiment of the application provides a computer readable storage medium, which stores computer executable instructions for realizing the data processing method provided by the embodiment of the application when being executed by a processor.

Embodiments of the present application provide a computer program product comprising a computer program or computer-executable instructions stored in a computer-readable storage medium. The processor of the electronic device reads the computer-executable instructions from the computer-readable storage medium, and executes the computer-executable instructions, so that the electronic device performs the data processing method according to the embodiment of the present application.

The embodiment of the application has the following beneficial effects:

The method comprises the steps of carrying out dimension aggregation on each target data dimension in a history query record by acquiring the history query record of a target object aiming at a data set to be processed to obtain at least one target dimension group, carrying out data division on the data set to be processed based on the target dimension group to obtain target data sets corresponding to each target dimension group, and responding to a data query request of the target object through the target data sets. In this way, the history query records include sub-query records corresponding to the target objects at each history query time, and at least one target dimension group is obtained by dimension aggregation of the target data dimensions in the history query records, so that the obtained target dimension group can accurately reflect the historical data query requirements of the target objects, and the data set to be processed is subjected to data division through the target dimension group to obtain the target data set corresponding to each target dimension group, so that the obtained target data set can accurately reflect the historical data query requirements of the target objects under the target dimension group, and the accuracy of data division is effectively improved.

Drawings

FIG. 1 is a schematic diagram of a data processing system according to an embodiment of the present application;

FIG. 2 is a schematic diagram of an electronic device for data processing according to an embodiment of the present application;

FIG. 3 is a flowchart illustrating a data processing method according to an embodiment of the present application;

FIG. 4 is a second flow chart of a data processing method according to an embodiment of the present application;

FIG. 5 is a flowchart illustrating a data processing method according to an embodiment of the present application;

Fig. 6 is a flowchart of a data processing method according to an embodiment of the present application.

Detailed Description

The present application will be further described in detail with reference to the accompanying drawings, for the purpose of making the objects, technical solutions and advantages of the present application more apparent, and the described embodiments should not be construed as limiting the present application, and all other embodiments obtained by those skilled in the art without making any inventive effort are within the scope of the present application.

In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is to be understood that "some embodiments" can be the same subset or different subsets of all possible embodiments and can be combined with one another without conflict.

In the following description, the terms "first", "second", "third" and the like are merely used to distinguish similar objects and do not represent a specific ordering of the objects, it being understood that the "first", "second", "third" may be interchanged with a specific order or sequence, as permitted, to enable embodiments of the application described herein to be practiced otherwise than as illustrated or described herein.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the application only and is not intended to be limiting of the application.

Before describing embodiments of the present application in further detail, the terms and terminology involved in the embodiments of the present application will be described, and the terms and terminology involved in the embodiments of the present application will be used in the following explanation.

1) Online analytical processing (OLAP, online Analytical Processing): is a software technology which enables analysts to observe information from various aspects rapidly, consistently and interactively to achieve the goal of deep understanding of data. It has FASMI (FAST ANALYSIS of Shar ed Multidimensional Information), a feature of fast analysis of shared multidimensional information. Where F is Fast, meaning that the system can react to most of the user's analysis requirements in a matter of seconds; a is Analysis, meaning that the user can define new specialized calculations without programming as part of the Analysis and report in the manner desired by the user; m is multidimensional (Multi-dimensio nal), meaning providing a multidimensional view and analysis of data analysis; i is informativity (Information), which means that Information can be obtained in time and large-capacity Information is managed. The online analysis processing system is the most main application of the data warehouse system, is specially used for supporting complex analysis operation, and focuses on decision support of decision makers and high-level management staff. OLAP can carry out complex query processing of large data volume according to the requirements of analysts, and provides query results for decision-makers in an intuitive form so that the analysts can accurately grasp the business conditions of enterprises (companies), know the requirements of objects and formulate correct schemes.

2) Data dimension: fields of the statistics dimension, such as time, place, merchandise, etc.

3) Measurement: fields of statistical data results such as sales, number of transactions, etc.

4) Data Cube (Data Cube): is a technical architecture for data analysis and indexing. The method is a processor for Big Data (Big Data), and can index the metadata in real time by any multiple keywords. After the metadata is analyzed through the data cube, the query and retrieval efficiency of the data can be greatly improved. The data cube is arranged on the data storage layer and the database system, and after the data cube is analyzed, the services such as data query and retrieval can be greatly increased, and the system platform can have the advantages of real-time data storage, real-time query result transmission and the like.

In the implementation of the embodiments of the present application, the applicant found that the related art has the following problems:

In the related art, for data division of a data set, the data is directly divided according to data dimensions, so that the data set obtained by division cannot accurately reflect the historical query habit of a user, and the accuracy of data division is low.

Embodiments of the present application provide a data processing method, apparatus, electronic device, computer readable storage medium, and computer program product, which can effectively improve accuracy of data partitioning, and an exemplary application of the data processing system provided by the embodiments of the present application is described below.

With reference to fig. 1, fig. 1 is a schematic architecture diagram of a data processing system 100 according to an embodiment of the present application, where a terminal (a terminal 400 is shown in an exemplary manner) is connected to a server 200 through a network 300, where the network 300 may be a wide area network or a local area network, or a combination of the two.

The terminal 400 is configured to display a target data set at a graphical interface 410-1 (graphical interface 410-1 is shown as an example) for use by a user using a client 410. The terminal 400 and the server 200 are connected to each other through a wired or wireless network.

In some embodiments, the server 200 may be a stand-alone physical server, a server cluster or a distributed system formed by a plurality of physical servers, or may be a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, a content delivery network (Content Delivery Network, CDN), and basic cloud computing services such as big data and artificial intelligence platforms. The terminal 400 may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart television, a smart watch, a car terminal, etc. The electronic device provided by the embodiment of the application can be implemented as a terminal or a server. The terminal and the server may be directly or indirectly connected through wired or wireless communication, which is not limited in the embodiment of the present application.

In some embodiments, the server 200 obtains a historical query record of the target object for the data set to be processed, performs dimension aggregation on each target data dimension in the historical query record to obtain at least one target dimension group, performs data partitioning on the data set to be processed based on the target dimension group to obtain a target data set corresponding to each target dimension group, and sends the target data set to the terminal 400.

In other embodiments, the terminal 400 obtains a historical query record of the target object for the data set to be processed, performs dimension aggregation on each target data dimension in the historical query record to obtain at least one target dimension group, performs data partitioning on the data set to be processed based on the target dimension group to obtain a target data set corresponding to each target dimension group, and sends the target data set to the server 200.

In other embodiments, the embodiments of the present application may be implemented by means of Cloud Technology (Cloud Technology), which refers to a hosting Technology that unifies serial resources such as hardware, software, networks, etc. in a wide area network or a local area network, so as to implement calculation, storage, processing, and sharing of data.

The cloud technology is a generic term of network technology, information technology, integration technology, management platform technology, application technology and the like based on cloud computing business model application, can form a resource pool, and is flexible and convenient as required. Cloud computing technology will become an important support. Background services of technical network systems require a large amount of computing and storage resources.

Referring to fig. 2, fig. 2 is a schematic structural diagram of an electronic device 500 for data processing according to an embodiment of the present application, where the electronic device 500 shown in fig. 2 may be the server 200 or the terminal 400 in fig. 1, and the electronic device 500 shown in fig. 2 includes: at least one processor 430, a memory 450, at least one network interface 420. The various components in electronic device 500 are coupled together by bus system 440. It is understood that the bus system 440 is used to enable connected communication between these components. The bus system 440 includes a power bus, a control bus, and a status signal bus in addition to the data bus. But for clarity of illustration the various buses are labeled in fig. 2 as bus system 440.

The Processor 430 may be an integrated circuit chip with signal processing capabilities such as a general purpose Processor, such as a microprocessor or any conventional Processor, a digital signal Processor (DSP, digital Signal Processor), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like.

Memory 450 may be removable, non-removable, or a combination thereof. Exemplary hardware devices include solid state memory, hard drives, optical drives, and the like. Memory 450 optionally includes one or more storage devices physically remote from processor 430.

Memory 450 includes volatile memory or nonvolatile memory, and may also include both volatile and nonvolatile memory. The non-volatile Memory may be a Read Only Memory (ROM) and the volatile Memory may be a random access Memory (RAM, random Access Memory). The memory 450 described in embodiments of the present application is intended to comprise any suitable type of memory.

In some embodiments, memory 450 is capable of storing data to support various operations, examples of which include programs, modules and data structures, or subsets or supersets thereof, as exemplified below.

An operating system 451 including system programs, e.g., framework layer, core library layer, driver layer, etc., for handling various basic system services and performing hardware-related tasks, for implementing various basic services and handling hardware-based tasks;

A network communication module 452 for accessing other electronic devices via one or more (wired or wireless) network interfaces 420, the exemplary network interface 420 comprising: bluetooth, wireless compatibility authentication (WiFi, wireless Fidelity), and universal serial Bus (USB, univer SAL SERIAL Bus), etc.

In some embodiments, the data processing apparatus provided in the embodiments of the present application may be implemented in software, and fig. 2 shows the data processing apparatus 455 stored in the memory 450, which may be software in the form of a program, a plug-in, or the like, including the following software modules: the acquisition module 4551, the dimension aggregation module 4552, and the data partitioning module 4553 are logical, and thus may be arbitrarily combined or further split according to the implemented functions. The functions of the respective modules will be described hereinafter.

In other embodiments, the data processing apparatus provided in the embodiments of the present application may be implemented in hardware, and by way of example, the data processing apparatus provided in the embodiments of the present application may be a processor in the form of a hardware decoding processor, which is programmed to perform the data processing method provided in the embodiments of the present application, for example, the processor in the form of a hardware decoding processor may use one or more Application Specific Integrated Circuits (ASICs), DSPs, programmable logic devices (PLDs, programmable Logic Device), complex Programmable logic devices (CPLDs, complex Programmable Logic Device), field Programmable gate arrays (FPGAs, fiel d-Programmable GATE ARRAY), or other electronic components.

In some embodiments, the terminal or the server may implement the data processing method provided by the embodiments of the present application by running a computer program or computer executable instructions. For example, the computer program may be a native program (e.g., a dedicated data processing program) or a software module in an operating system, e.g., a data processing module that may be embedded in any program (e.g., an instant messaging client, an album program, an electronic map client, a navigation client); for example, a Native Application (APP) may be used, i.e. a program that needs to be installed in an operating system to be run. In general, the computer programs described above may be any form of application, module or plug-in.

The data processing method provided by the embodiment of the present application will be described in conjunction with exemplary applications and implementations of the server or the terminal provided by the embodiment of the present application.

Referring to fig. 3, fig. 3 is a schematic flow chart of a data processing method according to an embodiment of the present application, which will be described with reference to steps 101 to 104 shown in fig. 3, the data processing method according to an embodiment of the present application may be implemented by a server or a terminal alone or by a server and a terminal cooperatively, and will be described below by taking a server alone as an example.

In step 101, a historical query record of a target object for a set of data to be processed is obtained.

In some embodiments, the history query record includes a sub-query record corresponding to the target object at the history query time.

In some embodiments, the sub-query record is configured to record a target data dimension to which the data queried by the target object at the historical query time belongs.

By way of example, referring to Table 1 below, table 1 below is a schematic table of historical query records provided by embodiments of the present application.

TABLE 1 schematic Table of historical query records provided by embodiments of the application

Sub-query record id	Target data dimension
		T1	A1、A3、A8
T2	A10、A14、A18
		T3	A1、A3、A8、A9
T4	A2、A5
		T5	A10、A14、A18
T6	A2、A5
		T7	A1、A3、A8
T8	A4、A7

Referring to table 1, the history query records shown in table 1 above include a sub-query record T1 corresponding to the target object at the history query time A1, a sub-query record T2 corresponding to the history query time A2, a sub-query record T3 corresponding to the history query time A3, a sub-query record T4 corresponding to the history query time A4, a sub-query record T5 corresponding to the history query time A5, a sub-query record T6 corresponding to the history query time A6, a sub-query record T7 corresponding to the history query time A7, and a sub-query record T8 corresponding to the history query time A8.

Referring to table 1 above, taking the sub-query record T1 in table 1 above as an example, the sub-query record T8 is used for recording the target data dimension A8, the target data dimension A3 and the target data dimension A8 to which the data queried by the target object at the historical query time A8 belongs.

Referring to table 1, taking the sub-query record T8 in table 1 as an example, the sub-query record T1 is used for recording the target data dimension A4 and the target data dimension A7 to which the data queried by the target object at the historical query time A1 belongs.

In step 102, dimension aggregation is performed on the target data dimensions in the history query record, so as to obtain N target dimension groups.

In some embodiments, N is an integer greater than 1, the number of target data dimensions in the set of target dimensions being less than the total number of target data dimensions.

As an example, the target data dimensions A1, A2, A3, A4, A5, A6, A7, A8, A9, and the like in the history query record shown in table 1 above are dimension-aggregated to obtain a target dimension group Q1{ A1, A2, A3}, a target dimension group Q2{ A4, A5, A6, A7}, and a target dimension group Q2{ A8, A9}, where n=3.

In the above example, the number of target data dimensions in the target dimension group Q1 is smaller than the total number of target data dimensions, the number of target data dimensions in the target dimension group Q2 is smaller than the total number of target data dimensions, and the number of target data dimensions in the target dimension group Q3 is smaller than the total number of target data dimensions.

In some embodiments, the number of the target data dimensions in the target dimension group is smaller than the total number of the target data dimensions, so that the obtained target dimension group excludes the dimension group comprising the total number of the target data dimensions, thereby effectively saving the number of the target dimension group and effectively improving the algorithm execution efficiency.

In some embodiments, the step 102 may be implemented as follows: if the number of the target data dimensions in the history query record is less than or equal to 2, taking one target data dimension in the history query record as one target dimension group; if the total number of the target data dimensions in the history query record is greater than 2, traversing i performs the following process: performing M-2 times of dimension aggregation processing on the target data dimension to obtain a reference dimension group; and combining the reference dimension group obtained by M-2 dimension aggregation treatment, and determining the target dimension group.

In some embodiments, the number of target data dimensions used in the two adjacent dimension aggregation processes satisfies a preset rule, and the ith reference dimension group includes i+1 target data dimensions, where i is greater than or equal to 1 and less than or equal to M-2, and M is used to indicate the total number of the target data dimensions.

Carrying out 1 st dimension aggregation processing on each target data dimension to obtain at least one 1 st reference dimension group, wherein the 1 st reference dimension group comprises 2 target data dimensions. The 1 st dimension aggregation is performed on the target data dimensions A1, A2, A3, A4, A5, A6, A7, A8, A9, and the like in the history query record shown in table 1 above, to obtain A1 st reference dimension group M1{ A1, A2}, A1 st reference dimension group M2{ A4, A5}, A1 st reference dimension group M3{ A8, A9}, and the like.

And carrying out 2 nd dimension aggregation processing on each target data dimension to obtain at least one 2 nd reference dimension group, wherein the 2 nd reference dimension group comprises 3 target data dimensions. The 2 nd dimension aggregation is performed on the target data dimensions A1, A2, A3, A4, A5, A6, A7, A8, A9, and the like in the history query record shown in table 1 above, to obtain A2 nd reference dimension group M1{ A1, A2, A3}, A2 nd reference dimension group M2{ A4, A5, A3}, A2 nd reference dimension group M3{ A8, A9, A3}, and the like.

In some embodiments, the number of the reference dimension groups obtained by each dimension aggregation process is at least one; the ith dimension aggregation process may be implemented as follows: performing ith dimension combination on the target data dimension to obtain dimension combinations, wherein one dimension combination comprises i+1 target data dimensions; for each dimension combination, if sub-query records corresponding to all the target data dimensions in the dimension combination are overlapped, determining the dimension combination as a candidate dimension group; and combining the candidate dimension groups obtained by the ith dimension aggregation, and determining at least one reference dimension group obtained by the ith dimension aggregation.

With the above example in mind, the candidate dimension sets corresponding to the history query records shown in table 1 are shown in table 2 below, and table 2 below is a schematic representation of the candidate dimension sets provided in an embodiment of the present application.

TABLE 2 schematic tables of candidate dimension groups obtained by the 2 nd dimension aggregation process provided by the embodiments of the present application

Candidate dimension set	Querying record id intersections	Number of occurrences (dimension parameter value)
			A1、A3	T1、T3、T7	3
A1、A8	T1、T3、T7	3
			A1、A9	T3	1
A2、A5	T4、T6	2
			A3、A8	T1、T3、T7	3
A3、A9	T3	1
			A4、A7	T8	1
A8、A9	T3	1
			A10、A14	T2、T5	2
A10、A18	T2、T5	2
			A14、A18	T2、T5	2
......	......	......

With the above example, referring to the above table 1 and the above table 2, when the dimension combination is { A1, A3}, the sub-query record corresponding to the target data dimension A1 in the ith dimension combination is the sub-query record T1, the sub-query record T3 and the sub-query record T7; the sub-query records corresponding to the target data dimension A3 in the ith dimension combination are a sub-query record T1, a sub-query record T3 and a sub-query record T7. Then it is explained that sub-query records corresponding to the target data dimensions A1 and A3 in the ith dimension combination are at least partially identical, and then the ith dimension combination { A1, A3} is determined as the ith candidate dimension group.

With the above example in mind, the candidate dimension set obtained by the 3 rd dimension aggregation process corresponding to the history query record shown in table 1 is shown in table 3 below, and table 3 below is a schematic table of the 3 rd candidate dimension set provided in the embodiment of the present application.

TABLE 3 schematic tables of candidate dimension groups obtained by the 3 rd dimension aggregation process provided by the embodiments of the present application

With the above example, referring to table 1 and table 3, when the ith dimension combination is { A1, A3, A8}, the sub-query records corresponding to the target data dimension A1 in the ith dimension combination are the sub-query record T1, the sub-query record T3, and the sub-query record T7; sub-query records corresponding to the target data dimension A3 in the ith dimension combination are a sub-query record T1, a sub-query record T3 and a sub-query record T7; sub-query records corresponding to the target data dimension A8 in the ith dimension combination are a sub-query record T1, a sub-query record T3 and a sub-query record T7; then it is explained that sub-query records corresponding to the target data dimensions A1, A3 and A8 in the ith dimension combination are at least partially identical, and then the ith dimension combination { A1, A3, A8} is determined as the ith candidate dimension group.

In some embodiments, the aforementioned i-th dimension combination of the target data dimension, to obtain a dimension combination, may be implemented as follows: if i=1, for each target data dimension, performing dimension combination on the target data dimension and other target data dimensions respectively to obtain a first dimension combination; if i is more than or equal to 2, aiming at the obtained first dimension combination, carrying out dimension combination on the first dimension combination and the target data dimension to obtain a second dimension combination; and performing de-duplication processing on each first dimension combination and each second dimension combination to obtain the dimension combination.

For the above example, referring to table 1, if i=1, for the target data dimension A1, the target data dimension A1 is respectively combined with other target data dimensions A2, a target data dimension A3, a target data dimension A4, a target data dimension A5, a target data dimension A6, and the like, to obtain a first dimension combination { A1, A2}, a first dimension combination A1, A3}, a first dimension combination { A1, A4} and the like corresponding to the target data dimension A1.

For the above example, referring to table 1, if i=1, for the target data dimension A2, the target data dimension A2 is respectively combined with other target data dimensions A1, a target data dimension A3, a target data dimension A4, a target data dimension A5, a target data dimension A6, and the like, to obtain a first dimension combination { A2, A1}, a first dimension combination { A2, A3}, a first dimension combination { A2, A4} and the like corresponding to the target data dimension A2.

And carrying out the above example, namely, obtaining dimension combinations for the first dimension combinations and the second dimension combinations, and carrying out de-duplication treatment on the first dimension combinations { A2, A1} corresponding to the target data dimension A1 and the first dimension combinations { A1, A2} corresponding to the target data dimension A1 to obtain one dimension combination { A1, A2}.

In the above example, if i=2, the first dimension combinations { A1, A2} are respectively dimension-combined with each target data dimension to obtain second dimension combinations { A1, A2, A3} corresponding to the first dimension combinations { A1, A2, A3}, second dimension combinations { A1, A2, A4}, second dimension combinations { A1, A2, A5}, and the like.

In the above example, if i=3, the second dimension combinations { A1, A2, A3} are respectively combined with each target data dimension for the 3 rd dimension combinations { A1, A2, A3} to obtain second dimension combinations { A1, A2, A3, A4} corresponding to the second dimension combinations { A1, A2, A3, A4} and the second dimension combinations { A1, A2, A3, A4}, the second dimension combinations { A1, A2, A3, A5}, and the like.

In some embodiments, the determining the at least one reference dimension group obtained by the ith dimension aggregation processing in combination with the candidate dimension group obtained by the ith dimension aggregation processing may be implemented as follows: for each candidate dimension group, acquiring dimension parameter values of the candidate dimension group, wherein the dimension parameter values are used for indicating the number of the same sub-query records in the sub-query records corresponding to the candidate dimension group; comparing the dimension parameter value with a dimension parameter threshold value to obtain a comparison result; and if the comparison result indicates that the dimension parameter value is greater than or equal to the dimension parameter threshold value, determining the candidate dimension group as the reference dimension group.

In some embodiments, the dimension parameter value is used to indicate the number of identical sub-query records in the sub-query records corresponding to the candidate dimension group.

With the above example in mind, referring to table 2 above, the reference dimension sets corresponding to the second dimension combinations in table 2 above are shown in table 4 below, and table 4 below is a schematic representation of the reference dimension sets provided in the embodiments of the present application.

Table 4 schematic tables of reference dimension groups provided by embodiments of the present application

Reference dimension set	Querying record id intersections	Number of occurrences (dimension parameter value)
			A1、A3	T1、T3、T7	3
A1、A8	T1、T3、T7	3
			A2、A5	T4、T6	2
A3、A8	T1、T3、T7	3
			A10、A14	T2、T5	2
A10、A18	T2、T5	2
			A14、A18	T2、T5	2
......	......	......

With the above example in mind, see table 3 above, the reference dimension sets in table 3 above are shown in table 5 below, and table 5 below is a schematic representation of the reference dimension sets provided by embodiments of the present application.

Table 5 schematic table of reference dimension groups provided by embodiments of the present application

In some embodiments, the above-mentioned combination of the reference dimension set obtained by the M-2-time dimension aggregation process, and the determination of the target dimension set may be implemented as follows: acquiring dimension parameter values of each reference dimension group, wherein the dimension parameter values are used for indicating the same number of sub-query records in the reference dimension group; the following processes are sequentially and respectively executed for each reference dimension group according to the sequence of the dimension parameter values from big to small: and if all the target data dimensions in the reference dimension group are not recorded in the dimension combination list, determining the reference dimension group as the target dimension group.

For example, referring to table 6 below, table 6 below is a schematic table 1 of the reference dimension sets sorted in reverse order provided by embodiments of the present application.

TABLE 6 schematic Table 1 of the reference dimension groups ordered in reverse order provided by an embodiment of the application

For example, referring to Table 6 above, currently traverse to (X1, X3, X8), where the dimension combination list is empty, add (X1, X3, X8) to the dimension combination list, where the dimension combination list is [ (X1, X3, X8) ]; continuing to traverse to (X2, X4 and X10), checking that 3 fields (X2, X4 and X10) are not contained in the dimension combination list, and adding the 3 fields to the dimension combination list, wherein the dimension combination list is [ (X1, X3 and X8), (X2, X4 and X10) ]; continuing to traverse to (X2, X5 and X10), checking that 2 fields of X2 and X10 exist in the dimension combination list, and continuing to traverse without any operation; continuing to traverse to (X5, X6 and X9), checking that the dimension combination list has no (X5, X6 and X9) fields, and adding the 3 fields to the dimension combination list, wherein the dimension combination list is [ (X1, X3 and X8), (X2, X4 and X10), (X5, X6 and X9) ].

By way of example, see table 7 below, table 7 below is a schematic table 2 of the reference dimension sets sorted in reverse order provided by embodiments of the present application.

TABLE 7 schematic Table 2 of the reference dimension sets ordered in reverse order provided by an embodiment of the application

Currently traversing to (X7, X11), where the dimension combination list is [ (X1, X3, X8), (X2, X4, X10), (X5, X6, X9) ], not including X7 and X11, traversing down from the current data to the frequent 2-item set list, finding the first piece of data that includes 1 field of the current frequent 2-item set fields and does not appear in the dimension combination list, the next piece of data is (X2, X7), including X7, but X2 has appeared in the dimension combination list, continuing traversing, the next piece of data is (X14, X18), not including (X7, X11), continuing traversing, the next piece of data is (X11, X13), including X11, while X13 is not in the dimension combination list, thus combining (X7, X11) and (X11, X13) to the dimension combination list, where the dimension combination list is [ (X1, X3, X8), (X2, X4, X10), (X6, X7, X9), (X7, X11) ]. Continuing the traversal to (X2, X7), X2 already appears in the dimension combination list, continuing the traversal. Continuing to traverse to (X14, X18), X14 and X18 are not present in the dimension combination list, traversing down from the current data to the frequent 2-item set list, finding the first piece of data that contains 1 field of the current frequent 2-item set fields and is not present in the dimension combination list, the next piece of data is (X11, X13) and does not contain X14 or X18, continuing to traverse, the next piece of data is (X14, X15) and contains X14 while X15 is not in the dimension combination list, thus combining (X14, X18) and (X14, X15, X18) into (X14, X15, X18) to the dimension combination list, wherein the dimension combination list is [ (X1, X3, X8), (X2, X4, X10), (X5, X6, X9), (X7, X11, X13), (X14, X15, X18) ].

By way of example, see table 8 below, table 8 below is a schematic table 3 of the reverse ordered set of reference dimensions provided by an embodiment of the present application.

Table 8 schematic table 3 of the reverse ordered reference dimension sets provided by embodiments of the present application

Adding (X12, X19, X20) to the dimension combination list, and finally adding (X16, X17) to the dimension combination list. The dimension combination list is [ (X1, X3, X8), (X2, X4, X10), (X5, X6, X9), (X7, X11, X13), (X14, X15, X18), (X12, X19, X20), (X16, X17) ].

In some embodiments, if none of the target data dimensions in the reference dimension group is recorded in the dimension combination list, before determining the reference dimension group as the target dimension group, it may be further determined that none of the target data dimensions in the ith reference dimension group is recorded in the dimension combination list by: comparing the target data dimension with each target data dimension in the dimension combination list for each target data dimension in the reference dimension group to obtain a record comparison result; and if each record comparison result indicates that the dimension combination list does not have the same target data dimension as the target data dimension, determining that each target data dimension in the reference dimension group is not recorded in the dimension combination list.

In some embodiments, after the reference dimension set is determined to be the target dimension set, the dimension combination list may be updated by: and adding the target data dimension in the reference dimension group to the dimension combination list.

In this way, dimension aggregation is performed on each target data dimension in the history query record to obtain at least one target dimension group, so that each target data dimension in the target dimension group can accurately reflect the history data dimension query habit of the target object.

In step 103, based on the target dimension groups, dividing the data in the data set to be processed into the N target dimension groups, so as to obtain N target data sets corresponding to the N target dimension groups.

In some embodiments, the target data set is configured to respond to a data query request of the target object. The target dimension groups are in one-to-one correspondence with the target data sets.

In some embodiments, the dividing the data in the data set to be processed into the N target dimension groups based on the target dimension groups to obtain N target data sets corresponding to the N target dimension groups may be implemented as follows: for each of the N target dimension groups, determining target data dimensions included in the target dimension group as component data dimensions of the target dimension group, and determining data belonging to the component data dimensions in the data set to be processed as target data corresponding to the component data dimensions; if the number of the component data dimensions is equal to 1, carrying out data fusion on summary data corresponding to each target data in the data set to be processed and the target data to obtain a target data set corresponding to the target dimension group; if the number of the composition data dimensions is greater than 1, data aggregation is carried out on the target data to obtain at least one target data group, the target data in one target data group belong to different composition data dimensions, and a target data set corresponding to the target dimension group is determined based on the at least one target data group.

In some embodiments, the target data in the target data set is attributed to a different one of the constituent data dimensions. And the target data set corresponding to the target dimension group comprises a data group set corresponding to each target data group.

As an example, referring to table 9 below, table 9 below is a schematic table of a set of data to be processed provided by an embodiment of the present application.

Table 9 schematic table of the data set to be processed provided in the embodiment of the application

The above example is accepted, the target dimension groups are (A, B, C) and (D, E), the record id list of the dimension value of each dimension is calculated, the constructed inverted index table is shown in the following table 10, and the table 10 is the inverted index table provided in the embodiment of the present application.

TABLE 10 inverted index Table provided by embodiments of the present application

In the above example, for the target dimension group (A, B, C), if the number of the component data dimensions of the target dimension group (A, B, C) is greater than 1, data belonging to the component data dimension a, the component data dimension B and the component data dimension C in the data set to be processed are respectively determined as target data corresponding to each component data dimension, and data aggregation is performed on each target data to obtain a target data group.

With the above example in mind, for the target dimension set (A, B, C), see table 11 below, table 11 below is a schematic table 1 of the target data set provided by embodiments of the present application.

Table 11 schematic table 1 of the target data set provided by the embodiment of the application

With the above example in mind, for the target dimension set (D, E), see table 12 below, table 12 below is a schematic table 2 of the target data set provided by embodiments of the present application.

Table 12 schematic table 2 of the target data set provided by the embodiment of the present application

In some embodiments, the determining, based on the at least one target data set, a target data set corresponding to the target dimension set may be implemented as follows: the following processing is performed for each of the target data sets: determining each target data in the target data set as the composition data of the target data set; and carrying out data fusion on summary data corresponding to each component data in the data set to be processed and the component data to obtain a target data set corresponding to the target data set.

With the above example in mind, for the target dimension set (A, B, C), see table 13 below, table 13 is a schematic table 1 of a target data set corresponding to the target data set provided in an embodiment of the present application.

Table 13 schematic table 1 of the target data set corresponding to the target data set provided in the embodiment of the application

Target data set	Summary data corresponding to composition data
		(T₁,T₁,T₁)	{<1,5,70>}
(T₁,T₁,T₂)	{}
		(T₁,T₂,T₁)	{<2,3,10>,{<3,8,20>}
(T₁,T₂,T₂)	{}
		(T₂,T₁,T₁)	{}
(T₂,T₁,T₂)	{<4,5,40>,{<5,2,50>}
		(T₂,T₂,T₁)	{}
(T₂,T₂,T₂)	{}

With the above example in mind, for the target dimension set (D, E), see table 14 below, table 14 is a schematic table 2 of a target data set corresponding to the target data set provided in an embodiment of the present application.

Table 14 schematic table 2 of the target data set corresponding to the target data set provided in the embodiment of the present application

Target data set	Summary data corresponding to composition data
		(T₁,T₁)	{<1,5,70>}
(T₁,T₂)	{<3,8,20>,<4,5,40>}
		(T₁,T₃)	{<5,2,50>}
(T₂,T₁)	{<2,3,10>}
		(T₂,T₂)	{}
(T₂,T₃)	{}

In some embodiments, following step 103 described above, the data query request may also be responded to by: responding to the data query request of the target object, and determining a reference data set meeting the data query request from the N target data sets; responding to the data query request of the target object based on the reference data set.

In some embodiments, the responding to the data query request of the target object based on the reference data set may be implemented as follows: if the number of the reference data sets is one, determining reference data meeting the data query request from the reference data sets, and sending the reference data to the target object; if the number of the reference data sets is a plurality of, determining reference data meeting the data query request from the reference data sets according to each reference data set; and carrying out data fusion on the reference data of the plurality of reference data sets to obtain fusion data, and sending the fusion data to the target object.

In some embodiments, the above-mentioned determining the reference data set satisfying the data query request from the N target data sets may be implemented as follows: analyzing the data query request to obtain an analysis result; if the analysis result indicates that the data query request does not carry the expected data dimension, determining a target data set with the largest request frequency as the reference data set; and if the analysis result indicates that the data query request carries the expected data dimension, determining a target data set comprising the expected data dimension as a reference data set meeting the data query request.

In some embodiments, if the number of the reference data sets is one, which indicates that only one target data set in each target data set can meet the query requirement of the data query request, then the reference data meeting the query requirement of the data query request can be determined directly from the reference data sets, and the reference data is sent to the target object.

Therefore, when the number of the reference data sets is one, it is indicated that only one target data set in each target data set can meet the query requirement of the data query request, then the reference data meeting the query requirement of the data query request can be directly determined from the reference data sets, and the reference data is sent to the target object, so that the data query efficiency is effectively improved.

In some embodiments, if the number of the reference data sets is multiple, it is indicated that there are multiple target data sets in each target data set that can satisfy the query requirement of the data query request, then the reference data that satisfies the query requirement of the data query request may be determined from each reference data set, and data fusion is performed on each reference data to obtain fusion data, and the fusion data is sent to the target object.

Therefore, when the number of the reference data sets is a plurality of, it is indicated that a plurality of target data sets exist in each target data set and can meet the query expectation of the data query request, then the reference data meeting the query expectation of the data query request can be respectively determined from each reference data set, data fusion is carried out on each reference data to obtain fusion data, and the fusion data is sent to the target object, so that the data query request can be effectively responded accurately.

In this way, by acquiring a historical query record of a target object for a data set to be processed, dimension aggregation is performed on each target data dimension in the historical query record to obtain at least one target dimension group, data partitioning is performed on the data set to be processed based on the target dimension group to obtain target data sets corresponding to each target dimension group, and a data query request of the target object is responded through the target data sets. In this way, the history query records include sub-query records corresponding to the target objects at each history query time, and at least one target dimension group is obtained by dimension aggregation of the target data dimensions in the history query records, so that the obtained target dimension group can accurately reflect the historical data query requirements of the target objects, and the data set to be processed is subjected to data division through the target dimension group to obtain the target data set corresponding to each target dimension group, so that the obtained target data set can accurately reflect the historical data query requirements of the target objects under the target dimension group, and the accuracy of data division is effectively improved.

In the following, an exemplary application of an embodiment of the present application in an application scenario of actual data processing will be described.

In some embodiments, referring to fig. 4, fig. 4 is a second flowchart of a data processing method according to an embodiment of the present application, where the data processing method according to the embodiment of the present application may be implemented by steps 201 to 205 shown in fig. 4.

In step 201, OLAP query dimension history data is collected.

In some embodiments, the collected data analyzes the OLAP query records of the user, and after the query volume is accumulated to a certain data volume, the query records are cleaned to generate a table containing query ids and query dimension fields, and referring to table 15 below, table 15 below is a schematic table of the query dimension history data provided by the embodiments of the present application.

Table 15 schematic table of query dimension historical data provided by embodiments of the present application

Query record id	Dimension field
		T1	A1、A3、A8
T2	A10、A14、A18
		T3	A1、A3、A8、A9
T4	A2、A5
		T5	A10、A14、A18
T6	A2、A5
		T7	A1、A3、A8
T8	A4、A7
		......	......

In step 202, a set of query dimension frequent items is mined.

In some embodiments, the system may need to set a minimum support value, and the number of occurrences of the term set is less than the minimum support, filtered, assuming the minimum support is 2.

The table in the horizontal format of table 1 is converted into a vertical data format of a dimension field and a query record id, and a schematic table of converted query dimension history data (the history described above) is referred to in table 16 below, and table 16 below is a schematic table of converted query dimension history data provided by an embodiment of the present application.

Table 16A schematic table of converted query dimension historical data provided by an embodiment of the application

Dimension field	Query record id	Number of occurrences
			A1	T1、T3、T7	3
A2	T4、T6	2
			A3	T1、T3、T7	3
A4	T8	1
			A5	T4、T6	2
A7	T8	1
			A8	T1、T3、T7	3
A9	T3	1
			A10	T2、T5	2
A14	T2、T5	2
			A18	T2、T5	2
......	......	......

Then, according to the dimension field pairwise combinations, calculating the intersection of query record ids of each pair of frequent item sets, filtering out field combinations with non-empty intersections, as shown in the following table 17, where the following table 17 is a combination table 1 provided by the embodiment of the present application:

table 17 combination table 1 provided by the example of the present application

Dimension field	Querying record id intersections	Number of occurrences
			A1、A3	T1、T3、T7	3
A1、A8	T1、T3、T7	3
			A1、A9	T3	1
A2、A5	T4、T6	2
			A3、A8	T1、T3、T7	3
A3、A9	T3	1
			A4、A7	T8	1
A8、A9	T3	1
			A10、A14	T2、T5	2
A10、A18	T2、T5	2
			A14、A18	T2、T5	2
......	......	......

Filtering out (A1, A9), (A3, A9), (A4, A7), (A8, A9) with minimum support less than 2 to obtain frequent 2 term sets, as shown in table 18 below, table 18 below is a combination table 2 provided by the present example:

Table 18 combination table 2 provided by the embodiment of the application

Dimension field	Querying record id intersections	Number of occurrences
			A1、A3	T1、T3、T7	3
A1、A8	T1、T3、T7	3
			A2、A5	T4、T6	2
A3、A8	T1、T3、T7	3
			A10、A14	T2、T5	2
A10、A18	T2、T5	2
			A14、A18	T2、T5	2
......	......	......

Combining the frequent 2 item sets and the frequent 1 item sets two by two, calculating the query record id intersection of each pair of combined frequent item sets, filtering out field combinations with intersection not being empty, as shown in the following table 19, wherein the following table 19 is a combination table 3 provided by the embodiment of the present application:

Table 19 the combination table 3 provided by the embodiment of the application

Filtering out frequent items with minimum support less than 2 to obtain all frequent 3 item sets, as shown in the following table 20, where the following table 20 is a combination table four provided by the embodiment of the present application:

Table 20 the combination table four provided in the embodiment of the present application

In step 203, segment dimension combinations are calculated from the frequent item sets.

In some embodiments, the dimension combination list is initially set to be empty, the frequent 3 item sets mined in the previous step are arranged in an inverted order according to the occurrence times, the frequent 3 item set list is traversed one by one, whether the current frequent 3 item set field appears in the dimension combination list is checked, if not, the current frequent 3 item set is added to the dimension combination list, and if so, the traversing is continued.

Assuming that the list of frequent 3-item sets after reverse order ordering is shown in the following table 21, table 21 is a reverse order ordering table 1 provided in the embodiment of the present application:

Table 21 the reverse order sort table 1 provided in the embodiment of the present application

Currently traversing to (X1, X3 and X8), wherein the dimension combination list is empty, adding (X1, X3 and X8) to the dimension combination list, and wherein the dimension combination list is [ (X1, X3 and X8) ]; continuing to traverse to (X2, X4 and X10), checking that 3 fields (X2, X4 and X10) are not contained in the dimension combination list, and adding the 3 fields to the dimension combination list, wherein the dimension combination list is [ (X1, X3 and X8), (X2, X4 and X10) ]; continuing to traverse to (X2, X5 and X10), checking that 2 fields of X2 and X10 exist in the dimension combination list, and continuing to traverse without any operation; continuing to traverse to (X5, X6 and X9), checking that the dimension combination list has no (X5, X6 and X9) fields, and adding the 3 fields to the dimension combination list, wherein the dimension combination list is [ (X1, X3 and X8), (X2, X4 and X10), (X5, X6 and X9) ].

In some embodiments, traversing the frequent 2 item sets, after traversing the frequent 3 item set list, arranging the frequent 2 item sets in an inverted order according to the occurrence frequency, traversing the frequent 2 item set list one by one, checking whether the current frequent 2 item set field appears in the dimension combination list, and if so, continuing traversing the next piece of data; if not, traversing the frequent 2-item set list downwards from the current data, finding the data which contains 1 field in the current frequent 2-item set fields and does not appear in the dimension combination list, if so, combining into a 3-tuple dimension combination, adding the 3-tuple dimension combination into the dimension combination list, and if not, continuing traversing downwards. Assuming that the list of frequent 2-item sets ordered in reverse order is shown in the following table 22, the following table 22 is the reverse order ordering table 2 provided in the embodiment of the present application:

Table 22 reverse order table 2 provided by the embodiment of the present application

Dimension field	Number of occurrences
		X7、X11	25000
X2、X7	22000
		X14、X18	21890
X11、X13	21000
		X14、X15	18000

Currently traversing to (X7, X11), where the dimension combination list is [ (X1, X3, X8), (X2, X4, X10), (X5, X6, X9) ], excluding X7 and X11, traversing down the frequent 2-item set list from the current data, finding the first piece of data that contains 1 field in the current frequent 2-item set field and does not appear in the dimension combination list, the next piece of data is (X2, X7), containing X7, but X2 has appeared in the dimension combination list, continuing traversing, the next piece of data is (X14, X18), excluding (X7, X11), continuing traversing, the next piece of data is (X11, X13),

Contain X11 while X13 is not in the dimension combination list, thus add (X7, X11) and (X11, X13) to the dimension combination list in combination (X7, X11, X13), where the dimension combination list is [ (X1, X3, X8), (X2, X4, X10), (X5, X6, X9), (X7, X11, X13) ]. Continuing the traversal to (X2, X7), X2 already appears in the dimension combination list, continuing the traversal. Continuing to traverse to (X14, X18), X14 and X18 are not present in the dimension combination list, traversing down from the current data to the frequent 2-item set list, finding the first piece of data that contains 1 field of the current frequent 2-item set fields and is not present in the dimension combination list, the next piece of data is (X11, X13) and does not contain X14 or X18, continuing to traverse, the next piece of data is (X14, X15) and contains X14 while X15 is not in the dimension combination list, thus combining (X14, X18) and (X14, X15, X18) into (X14, X15, X18) to the dimension combination list, wherein the dimension combination list is [ (X1, X3, X8), (X2, X4, X10), (X5, X6, X9), (X7, X11, X13), (X14, X15, X18) ].

In some embodiments, after traversing the frequent 1-item set and the frequent 2-item set, if the remaining dimension fields are not added to the dimension combination list, arranging the remaining dimension fields in reverse order according to the occurrence times of the frequent 1-item set, sequentially taking 3 dimensions as a group, adding the group into the dimension combination list, and finally adding the group into the dimension combination list as a new dimension combination if two dimensions are left, adding the group into the dimension combination list if 1 dimension is left, and adding the group into the dimension combination list if 4 dimensions are left and the group of the last dimension combination is formed.

Assuming that X12, X16, X17, X19, X20 still remain, the frequent 1-item set is arranged in reverse order as in Table 23 below, table 23 below is a reverse order sort Table 3 provided by an embodiment of the present application:

table 23 the reverse order sort table 3 provided by the embodiment of the present application

Dimension field	Number of occurrences
		X20	23000
X12	21000
		X19	20000
X16	19888
		X17	18000

In some embodiments, referring to fig. 5, fig. 5 is a flowchart illustrating a third data processing method according to an embodiment of the present application, and step 203 shown in fig. 4 may be implemented by steps 301 to 304 shown in fig. 5.

In step 301, a list of dimension combinations is set to be empty.

In step 302, the frequent 3 item set is traversed to add dimension combinations.

In step 303, the frequent 2 item set is traversed to add dimension combinations.

In step 304, the frequent 1 item set is traversed to add dimension combinations.

In step 204, a segment cube is computed and stored.

Constructing an inverted index, traversing the original data, and listing a data id list and a list size of each dimension value of each dimension, wherein the list size is the data id list of the dimension value, and the number of the dimension is 5: A. b, C, D, E, the divided dimensional combinations are (A, B, C) and (D, E), the raw data are shown in the following table 24, and the following table 24 is a schematic table of the raw data provided by the embodiment of the present application:

Table 24 schematic table of raw data provided by embodiments of the present application

Wherein A, B, C, D, E is the dimension field, count and sum are the metrics field, representing the number of times and the sum, such as the number of purchases, the total amount of purchases.

The record id list where the dimension value of each dimension is located is calculated, and the constructed inverted index table is shown in the following table 25, where the table 25 is the inverted index table provided by the embodiment of the present application:

Table 25 an inverted index Table provided by an embodiment of the present application

In some embodiments, a cube shell segment is computed, for each dimension combination, a 2-dimensional combination all-body build is computed, then a 3-dimensional combination all-body build is computed, for example, a (A, B, C) dimension combination segment is computed, a (A, B) segment is computed, all the dimension values of a and B are combined two by two, the record id list is intersected according to the inverted index table of the previous step, as shown in the following table 26, and the following table 26 is the inverted index table after conversion provided by the embodiment of the present application:

table 26 the inverted index table after conversion provided by the embodiment of the application

Cell unit	Intersection set	Record id list	List size
				(T₁,T₁)	{1,2,3}∩{1,4,5}	{1}	1
(T₁,T₂)	{1,2,3}∩{2,3}	{2,3}	2
				(T₂,T₁)	{4,5}∩{1,4,5}	{4,5}	2
(T₂,T₂)	{4,5}∩{2,3}	{}	0

Then calculate (A, B, C) the segment, combine the dimension values of the above table and C, also according to the inverted index table, take intersections for record id, as shown in the following table 27, the following table 27 is the inverted index table one after taking intersections provided in the embodiment of the present application:

table 27 inverted index table one after taking intersections provided in the embodiment of the present application

Cell unit	Intersection set	Record id list	List size
				(T₁,T₁,T₁)	{1}∩{1,2,3}	{1}	1
(T₁,T₁,T₂)	{1}∩{4,5}	{}	0
				(T₁,T₂,T₁)	{2,3}∩{1,2,3}	{2,3}	2
(T₁,T₂,T₂)	{2,3}∩{4,5}	{}	0
				(T₂,T₁,T₁)	{4,5}∩{1,2,3}	{}	0
(T₂,T₁,T₂)	{4,5}∩{4,5}	{4,5}	2
				(T₂,T₂,T₁)	{}∩{1,2,3}	{}	0
(T₂,T₂,T₂)	{}∩{4,5}	{}	0

In the same way, the segment is calculated (D, E), and table 28 below is the inverted index table two after intersection provided in the embodiment of the present application.

Table 28 inverted index table two after taking intersection provided by the embodiment of the present application

Cell unit	Intersection set	Record id list	List size
				(T₁,T₁)	{1,3,4,5}∩{1,2}	{1}	1
(T₁,T₂)	{1,3,4,5}∩{3,4}	{3,4}	2
				(T₁,T₃)	{1,3,4,5}∩{5}	{5}	1
(T₂,T₁)	{2}∩{1,2}	{2}	1
				(T₂,T₂)	{2}∩{3,4}	{}	0
(T₂,T₃)	{2}∩{5}	{}	0

After the calculation is completed, the storage fragment cube stores two measurement values of count and sum in the original data besides the dimensional value combination, because the OLAP query can also query the average value (avg), sum (sum) and the like besides the count, and the calculation of coun t and sum measurement is needed. Record id list each record stores the format < id, count, sum >, the final stored fragment cube is as follows:

dimension combination (A, B, C) segment cube as shown in table 29 below, table 29 below is a schematic representation of a dimension combination segment cube provided by an embodiment of the present application:

table 29 schematic representation of a dimension combination fragment cube provided by an embodiment of the present application

Cell unit	Record id list
		(T₁,T₁,T₁)	{<1,5,70>}
(T₁,T₁,T₂)	{}
		(T₁,T₂,T₁)	{<2,3,10>,{<3,8,20>}
(T₁,T₂,T₂)	{}
		(T₂,T₁,T₁)	{}
(T₂,T₁,T₂)	{<4,5,40>,{<5,2,50>}
		(T₂,T₂,T₁)	{}
(T₂,T₂,T₂)	{}

Dimension combination (D, E) segment cube as shown in table 30 below, table 30 below is a schematic table two of the dimension combination segment cube provided by embodiments of the present application:

Table 30 schematic table two of dimension combination fragment cubes provided by embodiments of the present application

Cell unit	Record id list
		(T₁,T₁)	{<1,5,70>}
(T₁,T₂)	{<3,8,20>,<4,5,40>}
		(T₁,T₃)	{<5,2,50>}
(T₂,T₁)	{<2,3,10>}
		(T₂,T₂)	{}
(T₂,T₃)	{}

In some embodiments, referring to fig. 6, fig. 6 is a flowchart illustrating a data processing method according to an embodiment of the present application, and step 204 shown in fig. 4 may be implemented by steps 401 to 403 shown in fig. 6.

In step 401, the raw data is scanned to construct a dimension value inverted index.

In step 402, a combination of dimension values for each segment cube is calculated.

In step 403, each segment cube data is stored.

In step 205, an OLAP query is computed.

After the dimension combination segment cube is calculated, the dimension combination segment cube can be used for the OLAP query of the user, if the dimension fields of the OLAP query of the user are all in one segment cube, such as query < A, B, C, count >, the segment cube is directly queried (A, B, C), and if the dimension of the OLAP query is not in one segment cube, such as query < A, B, E, sum >, the cross-segment query is needed.

In some embodiments, for intra-segment queries, according to the dimension values of the query, the corresponding dimension value combination calculation is directly taken out of the segment cube, for example, query < T ₁,T₁,T₁, count >, and the list of record ids of the dimension values < T ₁,T₁,T₁, > is taken as follows: { <1,5,70> }, therefore < T ₁,T₁,T₁, count > =5.

For example, query < T ₁,T₁,T₁, avg >, take < T ₁,T₁, > all dimension value combinations:

(T₁,T₁,T₁)

{<1,5,70>}

<T₁,T₁,T₁,avg>＝sum/count＝70/5＝14。

in some embodiments, for cross-segment queries, according to the dimension values of the query, the sub-queries are first split, one segment cube for each sub-query, all the dimension values within each segment cube are combined to form a union, the different segment cubes are intersected, and then the metric value is calculated.

For example, queries < T ₁,T₁,T₁, count > and < T ₁,T₁,T₁, sum >, divide the query into two sub-queries, < T ₁,T₁,T₁ > and < T ₁,T₁ >.

All dimension values of < T ₁,T₁,T₁ > are combined as

(T₁,T₁,T₁)

{<1,5,70>}

All dimensional value combinations of < T ₁,T₁ > are:

(T₁,T₁)	{<1,5,70>}
		(T₂,T₁)	{<2,3,10>}

Obtaining a union set:

(T₁,T₁)

{<1,5,70>,<2,3,10>}

Intersection of < T ₁,T₁,T₁ > and < T ₁,T₁ > yields:

(T₁,T₁,T₁,T₁,T₁)

{<1,5,70>}

<T₁,T₁,T₁,count>＝5；

<T₁,T₁,T₁,sum>＝70。

In this way, the historical OLAP query records of the user are collected, frequent item sets are mined based on the query records, common dimension combinations are calculated, then the segment cubes are split according to the dimension combinations, and finally the segment cubes are used for calculating the OLAP query requests of the user. The traditional form of storing a complete cube is solved, and the following effects are achieved: according to the common dimension combination segment storage cube, the storage space of the cube is greatly saved, the measurement value stored in the segment cube is an aggregate value pre-calculated in advance, the OLAP query of a user is calculated based on the aggregate value, the calculation performance is a qualitative leap, according to the common dimension combination segment storage cube, a large number of query requests in 3 dimensions can be calculated in one segment cube, the calculation performance is greatly improved, even if multi-dimensional query needs to be queried across segment cubes, because each segment cube is used for storing the pre-calculated aggregate value, and the query performance is much faster than that of the traditional scanning full-volume data calculation.

It will be appreciated that in the embodiments of the present application, related data such as a set of data to be processed is referred to, and when the embodiments of the present application are applied to specific products or technologies, user permissions or agreements need to be obtained, and the collection, use and processing of related data need to comply with related laws and regulations and standards of related countries and regions.

Continuing with the description below of an exemplary architecture of the data processing device 455 implemented as a software module provided by embodiments of the present application, in some embodiments, as shown in FIG. 2, the software modules stored in the data processing device 455 of the memory 450 may include: the acquiring module 4551 is configured to acquire a history query record of a target object for a data set to be processed, where the history query record includes a sub-query record corresponding to the target object at a history query time; the sub-query records are used for recording target data dimensions of the data queried by the target object at the historical query time; the dimension aggregation module 4552 is configured to aggregate dimensions of the target data dimensions in the history query record to obtain N target dimension groups, where N is an integer greater than 1; the data dividing module 4553 is configured to divide data in the data set to be processed into the N target dimension groups based on the N target dimension groups, so as to obtain N target data sets corresponding to the N target dimension groups, where one target dimension group corresponds to one target data set.

In the above aspect, the dimension aggregation module 4552 is further configured to use one target data dimension in the history query record as the target dimension group if the number of target data dimensions in the history query record is less than or equal to 2; if the number of the target data dimensions in the history query record is greater than 2, performing M-2 times of dimension aggregation processing on the target data dimensions to obtain a reference dimension group; the number of the target data dimensions used in the adjacent two-time dimension aggregation processing meets a preset rule; m is used to indicate the total number of target data dimensions; and combining the reference dimension group obtained by M-2 dimension aggregation treatment, and determining the target dimension group.

In the above solution, the dimension aggregation module 4552 is further configured to perform an ith dimension combination on the target data dimensions to obtain a dimension combination, where one dimension combination includes i+1 target data dimensions; for each dimension combination, if sub-query records corresponding to the target data dimension in the dimension combination are overlapped, determining the dimension combination as a candidate dimension group; and combining the candidate dimension groups obtained by the ith dimension aggregation, and determining at least one reference dimension group obtained by the ith dimension aggregation.

In the above solution, the dimension aggregation module 4552 is further configured to, if i=1, dimension-combine, for each target data dimension, the target data dimension with other target data dimensions to obtain a first dimension combination; if i is more than or equal to 2, aiming at the obtained first dimension combination, carrying out dimension combination on the first dimension combination and the target data dimension to obtain a second dimension combination; and performing de-duplication processing on the first dimension combination and the second dimension combination to obtain the dimension combination.

In the above solution, the dimension aggregation module 4552 is further configured to obtain, for each candidate dimension group, a dimension parameter value of the candidate dimension group, where the dimension parameter value is used to indicate the number of identical sub-query records in the sub-query records corresponding to the candidate dimension group; comparing the dimension parameter value with a dimension parameter threshold value to obtain a comparison result; and if the comparison result indicates that the dimension parameter value is greater than or equal to the dimension parameter threshold value, determining the candidate dimension group as the reference dimension group.

In the above aspect, the dimension aggregation module 4552 is further configured to obtain a dimension parameter value of each reference dimension group, where the dimension parameter value is used to indicate the number of the same sub-query records in the reference dimension group; the following processes are sequentially and respectively executed for each reference dimension group according to the sequence of the dimension parameter values from big to small: and if all the target data dimensions in the reference dimension group are not recorded in the dimension combination list, determining the reference dimension group as the target dimension group.

In the above aspect, the data processing apparatus 455 further includes: the comparison module is used for comparing the target data dimension with each target data dimension in the dimension combination list respectively aiming at each target data dimension in the reference dimension group to obtain a record comparison result; and if each record comparison result indicates that the dimension combination list does not have the same target data dimension as the target data dimension, determining that each target data dimension in the reference dimension group is not recorded in the dimension combination list.

In the above aspect, the data dividing module 4553 is further configured to determine, for each of the N target dimension groups, a target data dimension included in the target dimension group as a constituent data dimension of the target dimension group, and determine data belonging to the constituent data dimension in the data set to be processed as target data corresponding to the constituent data dimension; if the number of the component data dimensions is equal to 1, carrying out data fusion on summary data corresponding to each target data in the data set to be processed and the target data to obtain a target data set corresponding to the target dimension group; if the number of the composition data dimensions is greater than 1, data aggregation is carried out on the target data to obtain at least one target data group, the target data in one target data group belong to different composition data dimensions, and a target data set corresponding to the target dimension group is determined based on the at least one target data group.

In the above aspect, the data dividing module 4553 is further configured to perform, for each of the target data sets, the following processing: determining each target data in the target data set as the composition data of the target data set; and carrying out data fusion on summary data corresponding to each component data in the data set to be processed and the component data to obtain a target data set corresponding to the target data set.

In the above aspect, the data processing apparatus 455 further includes: the response module is used for responding to the data query request of the target object and determining a reference data set meeting the data query request from the N target data sets; responding to the data query request of the target object based on the reference data set.

In the above aspect, the response module is further configured to determine, from the reference data set, reference data that satisfies the data query request if the number of the reference data set is one, and send the reference data to the target object; if the number of the reference data sets is a plurality of, determining reference data meeting the data query request from the reference data sets according to each reference data set; and carrying out data fusion on the reference data of the plurality of reference data sets to obtain fusion data, and sending the fusion data to the target object.

In the above scheme, the response module is further configured to parse the data query request to obtain a parsed result; if the analysis result indicates that the data query request does not carry the expected data dimension, determining a target data set with the largest request frequency as the reference data set; and if the analysis result indicates that the data query request carries the expected data dimension, determining a target data set comprising the expected data dimension as a reference data set meeting the data query request.

Embodiments of the present application provide a computer-readable storage medium having stored therein computer-executable instructions which, when executed by a processor, cause the processor to perform a data processing method provided by embodiments of the present application, for example, a data processing method as shown in fig. 3.

In some embodiments, the computer readable storage medium may be FRAM, ROM, PROM, EPROM, EEPROM, flash memory, magnetic surface memory, optical disk, or CD-ROM; but may be a variety of electronic devices including one or any combination of the above-described memories.

In some embodiments, computer-executable instructions may be written in any form of programming language, including compiled or interpreted languages, or declarative or procedural languages, in the form of programs, software modules, scripts, or code, and they may be deployed in any form, including as stand-alone programs or as modules, components, subroutines, or other units suitable for use in a computing environment.

As an example, computer-executable instructions may, but need not, correspond to files in a file system, may be stored in a portion of a file that holds other programs or data, such as in one or more scripts in a hypertext markup language (HTML, hyper Text Markup Language) document, in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code).

As an example, computer-executable instructions may be deployed to be executed on one electronic device or on multiple electronic devices located at one site or distributed across multiple sites and interconnected by a communication network.

In summary, the embodiment of the application has the following beneficial effects:

(1) The method comprises the steps of carrying out dimension aggregation on each target data dimension in a history query record by acquiring the history query record of a target object aiming at a data set to be processed to obtain at least one target dimension group, carrying out data division on the data set to be processed based on the target dimension group to obtain target data sets corresponding to each target dimension group, and responding to a data query request of the target object through the target data sets. In this way, the history query records include sub-query records corresponding to the target objects at each history query time, and at least one target dimension group is obtained by dimension aggregation of the target data dimensions in the history query records, so that the obtained target dimension group can accurately reflect the historical data query requirements of the target objects, and the data set to be processed is subjected to data division through the target dimension group to obtain the target data set corresponding to each target dimension group, so that the obtained target data set can accurately reflect the historical data query requirements of the target objects under the target dimension group, and the accuracy of data division is effectively improved.

(2) And (3) using historical OLAP query records of the collected users, mining frequent item sets based on the query records, calculating common dimension combinations, then splitting a segment cube according to the dimension combinations, and finally using the segment cube to calculate the OLAP query requests of the users. The traditional form of storing a complete cube is solved, and the following effects are achieved: according to the common dimension combination segment storage cube, the storage space of the cube is greatly saved, the measurement value stored in the segment cube is an aggregate value pre-calculated in advance, the OLAP query of a user is calculated based on the aggregate value, the calculation performance is a qualitative leap, according to the common dimension combination segment storage cube, a large number of query requests in 3 dimensions can be calculated in one segment cube, the calculation performance is greatly improved, even if multi-dimensional query needs to be queried across segment cubes, because each segment cube is used for storing the pre-calculated aggregate value, and the query performance is much faster than that of the traditional scanning full-volume data calculation.

(3) And carrying out dimension aggregation on each target data dimension in the history query record to obtain at least one target dimension group, so that each target data dimension in the target dimension group can accurately reflect the history data dimension query habit of the target object.

(4) The number of the target data dimensions in the target dimension group is smaller than the total number of the target data dimensions, so that the obtained target dimension group excludes the dimension group comprising the total number of the target data dimensions, the number of the target dimension group is effectively saved, and the algorithm execution efficiency is effectively improved.

(5) When the number of the reference data sets is multiple, it is indicated that multiple target data sets exist in each target data set and can meet the query requirement of the data query request, then the reference data meeting the query requirement of the data query request can be respectively determined from each reference data set, data fusion is carried out on each reference data to obtain fusion data, and the fusion data is sent to the target object, so that accurate response is effectively carried out on the data query request.

(6) When the number of the reference data sets is one, it is indicated that only one target data set in each target data set can meet the query requirement of the data query request, then the reference data meeting the query requirement of the data query request can be directly determined from the reference data sets, and the reference data is sent to the target object, so that the data query efficiency is effectively improved.

The foregoing is merely exemplary embodiments of the present application and is not intended to limit the scope of the present application. Any modification, equivalent replacement, improvement, etc. made within the spirit and scope of the present application are included in the protection scope of the present application.

Claims

1. A method of data processing, the method comprising:

Acquiring a historical query record of a target object aiming at a data set to be processed, wherein the historical query record comprises sub-query records corresponding to the target object at the historical query time; the sub-query records are used for recording target data dimensions of the data queried by the target object at the historical query time;

2. The method of claim 1, wherein dimension aggregating the target data dimensions in the historical query records to obtain N target dimension groups comprises:

if the number of the target data dimensions in the history query record is less than or equal to 2, taking one target data dimension in the history query record as one target dimension group;

If the number of the target data dimensions in the history query record is greater than 2, performing M-2 times of dimension aggregation processing on the target data dimensions to obtain a reference dimension group; the number of the target data dimensions used in the adjacent two-time dimension aggregation processing meets a preset rule; m is used to indicate the total number of target data dimensions;

and combining the reference dimension group obtained by M-2 dimension aggregation treatment, and determining the target dimension group.

3. The method according to claim 2, wherein the number of reference dimension groups obtained by each dimension aggregation process is at least one; the ith dimension aggregation process includes:

Performing ith dimension combination on the target data dimension to obtain dimension combinations, wherein one dimension combination comprises i+1 target data dimensions; i is an integer greater than or equal to 1;

For each dimension combination, if sub-query records corresponding to the target data dimension in the dimension combination are overlapped, determining the dimension combination as a candidate dimension group;

And combining the candidate dimension groups obtained by the ith dimension aggregation, and determining at least one reference dimension group obtained by the ith dimension aggregation.

4. A method according to claim 3, wherein said performing an ith dimension combination on said target data dimensions to obtain a dimension combination comprises:

if i=1, performing dimension combination on each target data dimension and other target data dimensions to obtain a first dimension combination;

If i is more than or equal to 2, aiming at the obtained first dimension combination, carrying out dimension combination on the first dimension combination and the target data dimension to obtain a second dimension combination;

and performing de-duplication processing on the first dimension combination and the second dimension combination to obtain the dimension combination.

5. A method according to claim 3, wherein said determining at least one reference dimension set resulting from said ith dimension aggregation process in combination with a candidate dimension set resulting from said ith dimension aggregation process comprises:

for each candidate dimension group, acquiring dimension parameter values of the candidate dimension group, wherein the dimension parameter values are used for indicating the number of the same sub-query records in the sub-query records corresponding to the candidate dimension group;

Comparing the dimension parameter value with a dimension parameter threshold value to obtain a comparison result;

And if the comparison result indicates that the dimension parameter value is greater than or equal to the dimension parameter threshold value, determining the candidate dimension group as the reference dimension group.

6. The method of claim 2, wherein the determining the set of target dimensions in combination with the set of reference dimensions obtained from the M-2 dimensional aggregation process comprises:

Acquiring a dimension parameter value of each reference dimension group, wherein the dimension parameter value of one reference dimension group is used for indicating the number of the same sub-query records in the sub-query records corresponding to the reference dimension group;

sequencing the reference dimension groups obtained by M-2 times of aggregation according to the size of the dimension parameter values, and sequentially selecting the reference dimension groups from the reference dimension groups obtained by M-2 word aggregation;

for the selected reference dimension group, if none of the target data dimensions in the reference dimension group is recorded in the dimension combination list, determining the reference dimension group as a target dimension group; the dimension combination list is used for recording the determined target dimension group.

7. The method according to claim 1, wherein the dividing the data in the data set to be processed into the N target dimension groups based on the target dimension groups, to obtain N target data sets corresponding to the N target dimension groups, includes:

For each of the N target dimension groups, determining target data dimensions included in the target dimension group as component data dimensions of the target dimension group, and determining data belonging to the component data dimensions in the data set to be processed as target data corresponding to the component data dimensions;

If the number of the component data dimensions is equal to 1, carrying out data fusion on summary data corresponding to the target data in the data set to be processed and the target data to obtain a target data set corresponding to the target dimension group;

If the number of the composition data dimensions is greater than 1, data aggregation is carried out on the target data to obtain at least one target data group, the target data in one target data group belong to different composition data dimensions, and a target data set corresponding to the target dimension group is determined based on the at least one target data group.

8. The method of claim 7, wherein the target data set corresponding to the target dimension group includes a target data set corresponding to the target data group, and wherein the determining the target data set corresponding to the target dimension group based on the at least one target data group includes:

for each target data set, determining target data in the target data set as composition data of the target data set;

And carrying out data fusion on summary data corresponding to the composition data in the data set to be processed and the composition data to obtain a target data set corresponding to the target data set.

9. The method according to claim 1, wherein after dividing data in the data set to be processed into the N target dimension groups based on the target dimension groups to obtain N target data sets corresponding to the N target dimension groups, the method further comprises:

responding to the data query request of the target object, and determining a reference data set meeting the data query request from the N target data sets;

responding to the data query request of the target object based on the reference data set.

10. The method of claim 9, wherein responding to the data query request for the target object based on the reference data set comprises:

if the number of the reference data sets is one, determining reference data meeting the data query request from the reference data sets, and sending the reference data to the target object;

if the number of the reference data sets is a plurality of, determining reference data meeting the data query request from the reference data sets according to each reference data set;

and carrying out data fusion on the reference data of the plurality of reference data sets to obtain fusion data, and sending the fusion data to the target object.

11. The method of claim 9, wherein the determining a reference data set that satisfies the data query request from the N target data sets comprises:

analyzing the data query request to obtain an analysis result;

If the analysis result indicates that the data query request does not carry the expected data dimension, determining a target data set with the largest request frequency as the reference data set;

And if the analysis result indicates that the data query request carries the expected data dimension, determining a target data set comprising the expected data dimension as a reference data set meeting the data query request.

12. A data processing apparatus, the apparatus comprising:

13. An electronic device, the electronic device comprising:

a memory for storing computer executable instructions or computer programs;

A processor for implementing the data processing method of any one of claims 1 to 11 when executing computer-executable instructions or computer programs stored in the memory.

14. A computer-readable storage medium storing computer-executable instructions which, when executed by a processor, implement the data processing method of any one of claims 1 to 11.