CN116303833B

CN116303833B - OLAP-based vectorized data hybrid storage method

Info

Publication number: CN116303833B
Application number: CN202310559252.7A
Authority: CN
Inventors: 吴伟华; 李韩; 庞文刚; 胡磊明; 林金怡; 郭友; 唐梅娟; 钟文坤; 魏淼; 伍心怡; 林泓旭; 杨�远; 徐粲; 麦泽庆
Original assignee: Beijing Jianmozi Technology Co ltd; China Unicom WO Music and Culture Co Ltd
Current assignee: Beijing Jianmozi Technology Co ltd; China Unicom WO Music and Culture Co Ltd
Priority date: 2023-05-18
Filing date: 2023-05-18
Publication date: 2023-07-21
Anticipated expiration: 2043-05-18
Also published as: CN116303833A

Abstract

The invention relates to the field of data processing, and discloses an OLAP-based vectorized data hybrid storage method which is used for realizing efficient data storage and improving the data processing efficiency. The method comprises the following steps: inputting the data storage task into a task splitting model to perform storage task splitting processing to obtain a sub storage task set; vector data corresponding to each sub-storage task is obtained, and storage task logic relationship analysis is carried out on the sub-storage task set to obtain a storage logic association relationship; determining a plurality of storage nodes corresponding to the sub storage task sets according to the storage logic association relation, and acquiring a storage load corresponding to each storage node; performing storage space allocation on the sub-storage tasks according to the storage loads corresponding to each storage node to obtain a storage result corresponding to each storage node; and according to the storage logic association relation, carrying out result fusion on the storage results corresponding to each storage node to obtain target storage results corresponding to the data storage task.

Description

OLAP-based vectorized data hybrid storage method

Technical Field

The invention relates to the field of data processing, in particular to a vectorization data mixed storage method based on an OLAP.

Background

With the advent of the big data age, more and more businesses have been required to build mass data storage and processing systems and obtain valuable information therefrom. The traditional data storage method based on the relational database has the problems of low efficiency, low query speed and the like when facing large-scale, high-dimensional and complex-structure data, so that a more efficient, rapid and flexible data storage and processing mode is needed.

At present, the existing data storage and processing methods have lower storage and query efficiency, and the expansibility of a relational database becomes worse along with the increase of data volume, so that the requirements of large-scale data processing cannot be met, and the unstructured or semi-structured data storage requirements cannot be met.

Disclosure of Invention

The invention provides an OLAP-based vectorized data hybrid storage method, which is used for realizing efficient data storage and improving the efficiency of data processing.

The first aspect of the present invention provides an OLAP-based vectorized data hybrid storage method, which includes:

acquiring a data storage task to be processed, inputting the data storage task into a preset task splitting model for storage task decomposition processing to obtain a sub storage task set, wherein the sub storage task set comprises: a plurality of sub-storage tasks;

Vector data corresponding to each sub-storage task is obtained, and storage task logic relationship analysis is carried out on the sub-storage task set to obtain a storage logic association relationship;

determining a plurality of storage nodes corresponding to the sub storage task sets according to the storage logic association relation, and acquiring a storage load corresponding to each storage node;

distributing the plurality of sub-storage tasks to the plurality of storage nodes according to the storage load corresponding to each storage node, executing the corresponding sub-storage tasks in each storage node, and distributing storage space for the sub-storage tasks to obtain a storage result corresponding to each storage node;

and according to the storage logic association relation, carrying out result fusion on the storage results corresponding to each storage node to obtain the target storage results corresponding to the data storage task.

In combination with the first aspect, the method includes the steps of obtaining a data storage task to be processed, inputting the data storage task into a preset task splitting model to perform storage task splitting processing to obtain a sub storage task set, wherein the sub storage task set includes: a plurality of sub-storage tasks, comprising:

receiving a data storage request, and performing task query on the data storage request to obtain a data storage task;

Performing storage capacity and storage dimension analysis on the data storage task to obtain target storage capacity and storage dimension information;

according to the target storage amount and the storage dimension information, a preset task splitting model is called to perform storage task decomposition processing on the data storage task to obtain a sub storage task set, wherein the sub storage task set comprises: a plurality of sub-storage tasks.

In combination with the first aspect, the obtaining vector data corresponding to each sub-storage task, and performing storage task logic relationship analysis on the sub-storage task set, to obtain a storage logic association relationship, includes:

vector data corresponding to each sub-storage task is obtained, and storage task division is carried out on a plurality of sub-storage tasks in the sub-storage task set according to the vector data, so that at least one sub-storage task cluster is obtained;

performing storage task logic relationship analysis on each two sub storage tasks in the at least one sub storage task cluster, and determining an initial logic association relationship between each two sub storage tasks;

and constructing a storage logic association relationship corresponding to the sub storage task set according to the initial logic association relationship between every two sub storage tasks.

In combination with the first aspect, the obtaining vector data corresponding to each sub-storage task, and performing storage task division on a plurality of sub-storage tasks in the sub-storage task set according to the vector data, to obtain at least one sub-storage task cluster, includes:

classifying a plurality of sub-storage tasks in the sub-storage task set to obtain at least one task classification result;

and extracting corresponding sub-storage tasks from the sub-storage task set according to the at least one task classification result to generate at least one sub-storage task cluster.

In combination with the first aspect, the constructing a storage logic association relationship corresponding to the sub storage task set according to the initial logic association relationship between every two sub storage tasks includes:

performing association identifier matching on the logical association relationship between every two sub-storage tasks to obtain corresponding association identifiers between every two sub-storage tasks;

and carrying out association relation fusion on the logic association relation between every two sub-storage tasks based on the corresponding association identifications between every two sub-storage tasks to obtain corresponding storage logic association relation.

In combination with the first aspect, the determining the plurality of storage nodes corresponding to the sub storage task set according to the storage logic association relationship, and obtaining the storage load corresponding to each storage node includes:

information labeling is carried out on the storage logic association relation to obtain a plurality of labeling information, and a mapping relation between the labeling information and a storage node is constructed according to the plurality of labeling information;

according to the mapping relation, performing storage node mapping matching on a plurality of sub storage tasks in the sub storage task set respectively to obtain a plurality of storage nodes;

and carrying out node load balancing storage on the plurality of storage nodes according to the target storage amount to obtain storage loads corresponding to each storage node.

In combination with the first aspect, the distributing the plurality of sub-storage tasks to the plurality of storage nodes according to the storage load corresponding to each storage node, executing the corresponding sub-storage task in each storage node, and performing storage space allocation on the sub-storage task to obtain a storage result corresponding to each storage node, includes:

distributing the plurality of sub-storage tasks to the corresponding storage nodes according to the storage load and the mapping relation corresponding to each storage node;

Executing corresponding sub-storage tasks through storage programs in each storage node;

and carrying out storage space allocation on the sub-storage tasks through storage programs in each storage node to obtain a storage result corresponding to each storage node.

A second aspect of the present invention provides an OLAP-based vectorized data hybrid storage system, including:

the acquisition module is used for acquiring a data storage task to be processed, inputting the data storage task into a preset task splitting model for storage task decomposition processing to obtain a sub storage task set, wherein the sub storage task set comprises: a plurality of sub-storage tasks;

the analysis module is used for acquiring vector data corresponding to each sub-storage task, and carrying out storage task logic relationship analysis on the sub-storage task set to obtain a storage logic association relationship;

the processing module is used for determining a plurality of storage nodes corresponding to the sub storage task set according to the storage logic association relation and acquiring a storage load corresponding to each storage node;

the distribution module is used for distributing the plurality of sub-storage tasks to the plurality of storage nodes according to the storage load corresponding to each storage node, executing the corresponding sub-storage tasks in each storage node and distributing storage space for the sub-storage tasks to obtain a storage result corresponding to each storage node;

And the fusion module is used for carrying out result fusion on the storage results corresponding to each storage node according to the storage logic association relation to obtain the target storage results corresponding to the data storage task.

With reference to the second aspect, the obtaining module is specifically configured to:

With reference to the second aspect, the analysis module further includes:

the dividing unit is used for obtaining vector data corresponding to each sub-storage task, and dividing storage tasks of a plurality of sub-storage tasks in the sub-storage task set according to the vector data to obtain at least one sub-storage task cluster;

the analysis unit is used for carrying out storage task logic relationship analysis on every two sub storage tasks in the at least one sub storage task cluster and determining an initial logic association relationship between every two sub storage tasks;

And the construction unit is used for constructing the storage logic association relation corresponding to the sub storage task set according to the initial logic association relation between every two sub storage tasks.

With reference to the second aspect, the dividing unit is specifically configured to:

With reference to the second aspect, the analysis unit is specifically configured to:

With reference to the second aspect, the processing module is specifically configured to:

With reference to the second aspect, the allocation module is specifically configured to:

A third aspect of the present invention provides an OLAP based vectorized data hybrid storage device, comprising: a memory and at least one processor, the memory having instructions stored therein; the at least one processor invokes the instructions in the memory to cause the OLAP based vectorized data hybrid storage device to perform the OLAP based vectorized data hybrid storage method described above.

A fourth aspect of the present invention provides a storage machine readable storage medium having instructions stored therein that, when executed on a storage machine, cause the storage machine to perform the above-described OLAP based vectorized data hybrid storage method.

In the technical scheme provided by the invention, a data storage task is input into a task splitting model to carry out storage task splitting treatment to obtain a sub storage task set; vector data corresponding to each sub-storage task is obtained, and storage task logic relationship analysis is carried out on the sub-storage task set to obtain a storage logic association relationship; determining a plurality of storage nodes corresponding to the sub storage task sets according to the storage logic association relation, and acquiring a storage load corresponding to each storage node; performing storage space allocation on the sub-storage tasks according to the storage loads corresponding to each storage node to obtain a storage result corresponding to each storage node; according to the storage logic association relation, result fusion is carried out on the storage results corresponding to each storage node to obtain target storage results corresponding to the data storage task.

Drawings

FIG. 1 is a schematic diagram of an embodiment of an OLAP-based vectorized data hybrid storage method according to an embodiment of the present invention;

FIG. 2 is a flow chart of a storage task logical relationship analysis in an embodiment of the present invention;

FIG. 3 is a flow chart of storage task partitioning in an embodiment of the present invention;

FIG. 4 is a flowchart of determining an initial logical association between every two sub-storage tasks according to an embodiment of the present invention;

FIG. 5 is a diagram of an embodiment of an OLAP-based vectorized data hybrid storage system in accordance with an embodiment of the present invention;

FIG. 6 is a schematic diagram of another embodiment of an OLAP-based vectorized data hybrid storage system in accordance with an embodiment of the present invention;

FIG. 7 is a diagram of an embodiment of an OLAP-based vectorized data hybrid storage device in accordance with an embodiment of the present invention.

Detailed Description

The embodiment of the invention provides an OLAP-based vectorized data hybrid storage method, which is used for realizing efficient data storage and improving the efficiency of data processing. The terms "first," "second," "third," "fourth" and the like in the description and in the claims and in the above drawings, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments described herein may be implemented in other sequences than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus.

For ease of understanding, the following describes a specific flow of an embodiment of the present invention, referring to fig. 1, and an embodiment of an OLAP-based vectorized data hybrid storage method in an embodiment of the present invention includes:

s101, acquiring a data storage task to be processed, inputting the data storage task into a preset task splitting model for storage task decomposition processing to obtain a sub storage task set, wherein the sub storage task set comprises: a plurality of sub-storage tasks;

it will be appreciated that the execution subject of the present invention may be an OLAP-based vectorized data hybrid storage system, and may also be a terminal or a server, which is not limited herein. The embodiment of the invention is described by taking a server as an execution main body as an example.

Specifically, the server acquires a data storage task to be processed. The server needs to obtain the tasks that need to be stored from the distributed system. And then, inputting the data storage task into a preset task splitting model to perform storage task splitting processing. The server may combine the decomposed sub-tasks into a set of sub-storage tasks by decomposing a large task into multiple small sub-tasks using a task splitting model and assigning the multiple sub-tasks to different storage nodes or servers. This set contains multiple sub-tasks, each of which can be stored and processed independently.

S102, vector data corresponding to each sub-storage task is obtained, and storage task logic relation analysis is carried out on a sub-storage task set to obtain a storage logic association relation;

specifically, the server firstly needs to perform logic analysis on the sub-storage task set, the server is implemented by analyzing the dependency relationship and interaction between each sub-task in the sub-task set, and then, according to the result of the logic analysis, the storage logic association relationship between the sub-tasks is obtained, wherein the server can divide the sub-task set according to different logic relationships, and distributes each sub-task in the sub-task set to different storage nodes or servers for processing.

S103, determining a plurality of storage nodes corresponding to the sub storage task sets according to the storage logic association relation, and acquiring a storage load corresponding to each storage node;

first, the sub-storage task set is divided into a plurality of sub-task groups according to the storage logical association relationship. Each subtask group comprises a group of related subtasks, the subtasks can be distributed to the same storage node for processing, then, storage nodes corresponding to each subtask group are determined according to the availability of storage resources and the balance principle of storage loads, wherein a server is realized according to the storage capacity of different storage nodes, the availability of the storage resources, network bandwidth and other factors, the server needs to distribute the subtask groups to different storage nodes as uniformly as possible so as to realize better storage performance, further, the server obtains the storage loads corresponding to each storage node, particularly, the storage loads can be realized by monitoring the CPU utilization, the memory utilization, the I/O utilization and other indexes of different storage nodes, and finally, the server performs subtask distribution and load balance according to the mapping relation between the subtask groups and the storage nodes and the storage load condition of each storage node, and simultaneously obtains the storage loads corresponding to each storage node.

S104, distributing a plurality of sub-storage tasks to a plurality of storage nodes according to the storage load corresponding to each storage node, executing the corresponding sub-storage tasks in each storage node, and distributing storage space of the sub-storage tasks to obtain a storage result corresponding to each storage node;

specifically, it should be noted that the server performs traversal processing on the storage logic relationship, determines a storage logic relationship corresponding to each sub storage task in the sub storage task set, further, according to a storage load corresponding to each storage node, the server distributes a plurality of sub storage tasks to corresponding storage nodes according to the storage logic relationship corresponding to each sub storage task, further, the server executes the corresponding sub storage tasks through a storage program in each storage node, and finally, the server performs storage space allocation on the sub storage tasks to obtain a storage result corresponding to each storage node.

And S105, carrying out result fusion on the storage results corresponding to each storage node according to the storage logic association relation to obtain target storage results corresponding to the data storage tasks.

Specifically, the server monitors each storage node, further fuses the storage results generated by different storage nodes according to the storage logic association relationship, and performs operations such as merging, sorting, deduplication and the like on the storage results generated by different storage nodes, if data merging and the like are required, processes the fused storage results to obtain a final target storage result. The server may be implemented according to a specific storage task. For example, if it is a data aggregation task, statistical analysis may be performed on the fused data.

In the embodiment of the invention, a data storage task is input into a task splitting model to carry out storage task splitting processing to obtain a sub storage task set; vector data corresponding to each sub-storage task is obtained, and storage task logic relationship analysis is carried out on the sub-storage task set to obtain a storage logic association relationship; determining a plurality of storage nodes corresponding to the sub storage task sets according to the storage logic association relation, and acquiring a storage load corresponding to each storage node; performing storage space allocation on the sub-storage tasks according to the storage loads corresponding to each storage node to obtain a storage result corresponding to each storage node; according to the storage logic association relation, result fusion is carried out on the storage results corresponding to each storage node to obtain target storage results corresponding to the data storage task.

In a specific embodiment, the process of executing step S101 may specifically include the following steps:

(1) Receiving a data storage request, and performing task query on the data storage request to obtain a data storage task;

(2) Performing storage capacity and storage dimension analysis on the data storage task to obtain target storage capacity and storage dimension information;

(3) According to the target storage amount and the storage dimension information, a preset task splitting model is called to perform storage task splitting processing on the data storage task to obtain a sub storage task set, wherein the sub storage task set comprises: a plurality of sub-storage tasks.

Specifically, the server receives a data storage request, performs task query on the data storage request to obtain a data storage task, further performs storage amount and storage dimension analysis on the data storage task to obtain target storage amount and storage dimension information, specifically, the server can analyze the data storage task to obtain storage amount and storage dimension information of the task, prepares for subsequent task decomposition, and finally, invokes a preset task splitting model to perform storage task decomposition processing on the data storage task according to the target storage amount and the storage dimension information to obtain a sub-storage task set, wherein the sub-storage task set comprises: the server breaks down the data storage task into a plurality of sub storage tasks so as to be distributed to different storage nodes for parallel storage.

In a specific embodiment, as shown in fig. 2, the process of executing step S102 may specifically include the following steps:

s201, vector data corresponding to each sub-storage task is obtained, and storage task division is carried out on a plurality of sub-storage tasks in a sub-storage task set according to the vector data, so that at least one sub-storage task cluster is obtained;

s202, carrying out storage task logic relationship analysis on each two sub storage tasks in at least one sub storage task cluster, and determining an initial logic association relationship between each two sub storage tasks;

s203, constructing a storage logic association relationship corresponding to the sub storage task set according to the initial logic association relationship between every two sub storage tasks.

Specifically, the server acquires vector data corresponding to the sub-storage task: extracting feature vectors of the sub-storage tasks from the marked data set, or extracting features of the sub-storage tasks through a deep learning model. Sub-storage task partitioning: according to the vector data of the sub storage tasks obtained in the previous step, a clustering algorithm can be used for dividing the vector data to obtain a plurality of sub storage task clusters. The selection of the clustering algorithm should be made according to specific requirements and data characteristics. For example, a k-means algorithm or hierarchical clustering algorithm, etc. may be used. Storing task logic relation analysis: for every two sub-storage tasks in each sub-storage task cluster, storage task logic relationship analysis is needed to determine an initial logic association relationship between the sub-storage tasks. This may be achieved by calculating the similarity or distance between them. For example, cosine similarity, euclidean distance, manhattan distance, or the like may be used. Constructing a storage logic association relation: according to the initial logic association relationship between every two sub-storage tasks, a storage logic association relationship corresponding to the sub-storage task set can be constructed. This may be accomplished by defining the type of relationship (e.g., parallel, serial, etc.) and weight values between storage tasks. Different relation definitions can be formulated according to specific requirements and service scenes. For example, in parallel operation, multiple sub-storage tasks may be performed simultaneously; in serial operation, only one sub-storage task can be executed, and other tasks must wait for the execution of the previous task to be completed before the execution can be started. It should be noted that, the partitioning rule may be designed according to factors such as task attribute, storage resource, task quantity, etc., and the storage task logic relationship analysis is performed on each two sub storage tasks in at least one sub storage task cluster, so as to determine an initial logic association relationship between each two sub storage tasks, where the server analyzes the partitioned sub storage task cluster, determines a logic relationship between each two sub storage tasks, for example, the two sub storage tasks may need to be executed in sequence, or need to cooperate with each other to complete the tasks, etc., and constructs a storage logic association relationship corresponding to the sub storage task set according to the initial logic association relationship between each two sub storage tasks, where the server converts the logic relationship determined in the sub storage task cluster into a storage logic association relationship, for example, when the two sub storage tasks need to be executed in sequence, then a sequence relationship between the two sub storage tasks needs to be established.

In a specific embodiment, as shown in fig. 3, the process of executing step S201 may specifically include the following steps:

s301, classifying a plurality of sub-storage tasks in a sub-storage task set to obtain at least one task classification result;

s302, extracting corresponding sub-storage tasks from the sub-storage task set according to at least one task classification result, and generating at least one sub-storage task cluster.

Specifically, the server classifies a plurality of sub-storage tasks in the sub-storage task set to obtain at least one task classification result, wherein the server classifies the sub-tasks according to a certain attribute or characteristic, for example, according to a task type, a data source, storage complexity and the like, so that subsequent task allocation and scheduling are facilitated. Further, the server extracts corresponding sub-storage tasks from the sub-storage task set according to at least one task classification result to generate at least one sub-storage task cluster, wherein the server extracts corresponding sub-tasks from the sub-task set according to the task classification result, and forms a plurality of sub-tasks into one or more sub-task clusters so as to facilitate parallel storage.

In a specific embodiment, as shown in fig. 4, the process of executing step S202 may specifically include the following steps:

s401, performing association identifier matching on the logical association relationship between every two sub-storage tasks to obtain corresponding association identifiers between every two sub-storage tasks;

s402, carrying out association relation fusion on the logic association relation between every two sub-storage tasks based on the corresponding association identifications between every two sub-storage tasks to obtain the corresponding storage logic association relation.

Specifically, the server performs association identification matching on the logical association relationship between every two sub-storage tasks to obtain corresponding association identifications between every two sub-storage tasks, wherein the server matches the logical relationship between each sub-task with the association identifications to determine the corresponding association identifications between every two sub-tasks, and it is required to be noted that factors such as definition, allocation and use of the association identifications are considered to ensure accuracy and reliability of a matching result. Further, the server performs association fusion on the logical association relationship between every two sub-storage tasks based on the corresponding association identifier between every two sub-storage tasks to obtain a corresponding storage logical association relationship, wherein the server converts the logical relationship between the sub-tasks into the storage logical association relationship so as to facilitate efficient parallel storage, and it is required to consider factors such as the dependency relationship between the tasks and the priority of the tasks in the embodiment so as to ensure the accuracy and the efficiency of storage. Meanwhile, the association relation needs to be optimized and adjusted so that the storage is more balanced and stable.

In a specific embodiment, the process of executing step S103 may specifically include the following steps:

(1) Information labeling is carried out on the storage logic association relation to obtain a plurality of labeling information, and a mapping relation between the labeling information and the storage nodes is constructed according to the plurality of labeling information;

(2) According to the mapping relation, performing storage node mapping matching on a plurality of sub storage tasks in the sub storage task set respectively to obtain a plurality of storage nodes;

(3) And carrying out node load balancing storage on the plurality of storage nodes according to the target storage capacity to obtain the storage load corresponding to each storage node.

The method comprises the steps of carrying out information labeling on storage logic association relations to obtain a plurality of labeling information, constructing a mapping relation between the labeling information and storage nodes according to the plurality of labeling information, wherein the server labels the storage logic association relations and comprises labeling information such as task types, data sources, storage complexity, priorities and the like, and maps the labeling information with the storage nodes so as to facilitate subsequent task allocation and scheduling, and further carrying out storage node mapping matching on a plurality of sub-storage tasks in a sub-storage task set according to the mapping relation by the server to obtain a plurality of storage nodes, wherein the server distributes the sub-tasks to the corresponding storage nodes according to the mapping relation between the labeling information and the storage nodes to carry out storage according to the mapping relation between the labeling information and the storage nodes, and needs to consider factors such as attributes, resources and loads of the storage nodes to ensure high storage efficiency and reliability, and finally carrying out node load balancing storage according to target storage quantity, carrying out storage load balancing storage on the storage nodes to obtain storage loads corresponding to each storage node, carrying out load balancing storage node carrying out storage load balancing storage load on the storage nodes according to the target storage quantity and resource situation of the storage nodes to avoid the storage nodes to realize the high-level storage performance of the storage nodes, the storage nodes needing to be balanced storage bandwidth of the storage nodes, or the high-level storage node balancing performance is needed to be realized, the network is needed to realize the realization of the high-level storage performance is needed to be balanced.

In a specific embodiment, the process of executing step S104 may specifically include the following steps:

(1) Distributing a plurality of sub-storage tasks to corresponding storage nodes according to the storage load and the mapping relation corresponding to each storage node;

(2) Executing corresponding sub-storage tasks through storage programs in each storage node;

(3) And carrying out storage space allocation on the sub-storage tasks through storage programs in each storage node to obtain a storage result corresponding to each storage node.

Specifically, according to the storage load and the mapping relation corresponding to each storage node, a plurality of sub-storage tasks are distributed to the corresponding storage nodes respectively, wherein the server distributes the plurality of sub-storage tasks to the corresponding storage nodes for storage according to the previous load balancing storage result and the mapping relation, and it is to be noted that in the embodiment, factors such as the load condition of the storage nodes and the priority of the tasks need to be considered to realize efficient parallel storage, further, the server executes the corresponding sub-storage tasks through the storage programs in each storage node, wherein the server starts the corresponding storage programs on each storage node so as to execute the sub-storage tasks, and finally, the server distributes storage space of the sub-storage tasks through the storage programs in each storage node to obtain the storage result corresponding to each storage node, wherein the server distributes the storage space of the sub-storage tasks distributed to each storage node on each storage node to generate the corresponding storage result.

The above description is made on the OLAP-based vectorized data hybrid storage method in the embodiment of the present invention, and the following description is made on the OLAP-based vectorized data hybrid storage system in the embodiment of the present invention, referring to fig. 5, an embodiment of the OLAP-based vectorized data hybrid storage system in the embodiment of the present invention includes:

the obtaining module 501 is configured to obtain a data storage task to be processed, and input the data storage task into a preset task splitting model to perform storage task decomposition processing, so as to obtain a sub storage task set, where the sub storage task set includes: a plurality of sub-storage tasks;

the analysis module 502 is configured to obtain vector data corresponding to each sub-storage task, and perform storage task logic relationship analysis on the sub-storage task set to obtain a storage logic association relationship;

a processing module 503, configured to determine a plurality of storage nodes corresponding to the sub-storage task set according to the storage logical association relationship, and obtain a storage load corresponding to each storage node;

the allocation module 504 is configured to allocate the plurality of sub-storage tasks to the plurality of storage nodes according to a storage load corresponding to each storage node, execute the corresponding sub-storage task in each storage node, and allocate storage space for the sub-storage task, so as to obtain a storage result corresponding to each storage node;

And the fusion module 505 is configured to perform result fusion on the storage result corresponding to each storage node according to the storage logical association relationship, so as to obtain a target storage result corresponding to the data storage task.

Through the cooperation of the components, the data storage task is input into a task splitting model to be subjected to storage task splitting treatment, so as to obtain a sub storage task set; vector data corresponding to each sub-storage task is obtained, and storage task logic relationship analysis is carried out on the sub-storage task set to obtain a storage logic association relationship; determining a plurality of storage nodes corresponding to the sub storage task sets according to the storage logic association relation, and acquiring a storage load corresponding to each storage node; performing storage space allocation on the sub-storage tasks according to the storage loads corresponding to each storage node to obtain a storage result corresponding to each storage node; according to the storage logic association relation, result fusion is carried out on the storage results corresponding to each storage node to obtain target storage results corresponding to the data storage task.

Referring to fig. 6, another embodiment of the OLAP-based vectorized data hybrid storage system according to the present invention includes:

Optionally, the obtaining module 501 is specifically configured to:

Optionally, the analysis module 502 further includes:

the dividing unit 5021 is configured to obtain vector data corresponding to each sub-storage task, and divide storage tasks of a plurality of sub-storage tasks in the sub-storage task set according to the vector data to obtain at least one sub-storage task cluster;

the analysis unit 5022 is configured to perform storage task logic relationship analysis on each two sub storage tasks in the at least one sub storage task cluster, and determine an initial logic association relationship between each two sub storage tasks;

the building unit 5023 is configured to build a storage logical association relationship corresponding to the sub storage task set according to the initial logical association relationship between every two sub storage tasks.

Optionally, the dividing unit 5021 is specifically configured to:

Optionally, the analysis unit 5022 is specifically configured to:

Optionally, the processing module 503 is specifically configured to:

Optionally, the allocation module 504 is specifically configured to:

Fig. 5 and fig. 6 above describe the OLAP-based vectorized data hybrid storage system in the embodiment of the present invention in detail from the point of view of the modularized functional entity, and the OLAP-based vectorized data hybrid storage device in the embodiment of the present invention is described in detail from the point of view of hardware processing below.

Fig. 7 is a schematic structural diagram of an OLAP-based vectorized data hybrid storage device 600 according to an embodiment of the present invention, where the OLAP-based vectorized data hybrid storage device 600 may have a relatively large difference due to different configurations or performances, and may include one or more processors (central processing units, CPU) 610 (e.g., one or more processors) and a memory 620, and one or more storage media 630 (e.g., one or more mass storage devices) storing applications 633 or data 632. Wherein the memory 620 and the storage medium 630 may be transitory or persistent storage. The program stored on the storage medium 630 may include one or more modules (not shown), each of which may include a series of instruction operations on the OLAP-based vectorized data hybrid storage device 600. Still further, the processor 610 may be configured to communicate with the storage medium 630 to execute a series of instruction operations in the storage medium 630 on the OLAP based vectorized data hybrid storage device 600.

The OLAP based vectorized data hybrid storage device 600 may also include one or more power supplies 640, one or more wired or wireless network interfaces 650, one or more input output interfaces 660, and/or one or more operating systems 631, such as Windows service, mac OS X, unix, linux, freeBSD, and the like. Those skilled in the art will appreciate that the OLAP based vectorized data hybrid storage device structure shown in fig. 7 does not constitute a limitation on OLAP based vectorized data hybrid storage devices, and may include more or fewer components than illustrated, or may combine certain components, or a different arrangement of components.

The invention also provides an OLAP-based vectorized data hybrid storage device, which comprises a memory and a processor, wherein the memory stores machine-readable instructions, and the machine-readable instructions when executed by the processor cause the processor to execute the steps of the OLAP-based vectorized data hybrid storage method in the above embodiments.

The present invention also provides a storage machine readable storage medium, which may be a non-volatile storage machine readable storage medium, or may be a volatile storage machine readable storage medium, where instructions are stored in the storage machine readable storage medium, when the instructions run on the storage machine, cause the storage machine to perform the steps of the OLAP-based vectorized data hybrid storage method.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a storage machine readable storage medium. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in whole or in part in the form of a software product stored in a storage medium, comprising several instructions for causing a storage device (which may be a personal storage machine, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a random access memory (random acceS memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. The OLAP-based vectorized data hybrid storage method is characterized by comprising the following steps of:

vector data corresponding to each sub-storage task is obtained, and storage task logic relationship analysis is carried out on the sub-storage task set to obtain a storage logic association relationship; the obtaining vector data corresponding to each sub-storage task, and performing storage task logic relationship analysis on the sub-storage task set to obtain a storage logic association relationship includes: vector data corresponding to each sub-storage task is obtained, and storage task division is carried out on a plurality of sub-storage tasks in the sub-storage task set according to the vector data, so that at least one sub-storage task cluster is obtained; performing storage task logic relationship analysis on each two sub storage tasks in the at least one sub storage task cluster, and determining an initial logic association relationship between each two sub storage tasks; constructing a storage logic association relationship corresponding to the sub storage task set according to the initial logic association relationship between every two sub storage tasks;

2. The OLAP-based vectorized data hybrid storage method of claim 1, wherein the obtaining the data storage task to be processed, and inputting the data storage task into a preset task splitting model to perform storage task splitting processing, obtain a sub-storage task set, where the sub-storage task set includes: a plurality of sub-storage tasks, comprising:

3. The OLAP-based vectorized data hybrid storage method of claim 1, wherein the obtaining vector data corresponding to each sub-storage task and performing storage task division on a plurality of sub-storage tasks in the set of sub-storage tasks according to the vector data to obtain at least one sub-storage task cluster includes:

vector data corresponding to each sub-storage task is obtained, and a plurality of sub-storage tasks in the sub-storage task set are classified according to the vector data to obtain at least one task classification result;

4. The OLAP-based vectorized data hybrid storage method of claim 1, wherein the constructing the storage logical association corresponding to the set of sub-storage tasks according to the initial logical association between every two sub-storage tasks comprises:

5. The OLAP-based vectorized data hybrid storage method of claim 2, wherein determining a plurality of storage nodes corresponding to the set of sub-storage tasks according to the storage logical association relationship and obtaining a storage load corresponding to each storage node comprises:

6. The OLAP-based vectorized data hybrid storage method of claim 5, wherein the distributing the plurality of sub-storage tasks to the plurality of storage nodes according to the storage load corresponding to each storage node, executing the corresponding sub-storage task in each storage node and performing storage space allocation on the sub-storage task to obtain the storage result corresponding to each storage node comprises:

7. An OLAP-based vectorized data hybrid storage system, comprising:

the analysis module is used for acquiring vector data corresponding to each sub-storage task, and carrying out storage task logic relationship analysis on the sub-storage task set to obtain a storage logic association relationship; the obtaining vector data corresponding to each sub-storage task, and performing storage task logic relationship analysis on the sub-storage task set to obtain a storage logic association relationship includes: vector data corresponding to each sub-storage task is obtained, and storage task division is carried out on a plurality of sub-storage tasks in the sub-storage task set according to the vector data, so that at least one sub-storage task cluster is obtained; performing storage task logic relationship analysis on each two sub storage tasks in the at least one sub storage task cluster, and determining an initial logic association relationship between each two sub storage tasks; constructing a storage logic association relationship corresponding to the sub storage task set according to the initial logic association relationship between every two sub storage tasks;

8. An OLAP based vectorized data hybrid storage device, comprising: a memory and at least one processor, the memory having instructions stored therein;

the at least one processor invoking the instructions in the memory to cause the OLAP based vectorized data hybrid storage device to perform the OLAP based vectorized data hybrid storage method of any of claims 1-6.

9. A storage machine readable storage medium having instructions stored thereon, which when executed by a processor implement the OLAP based vectorized data hybrid storage method of any one of claims 1 to 6.