CN113836141A

CN113836141A - Big data cross indexing method based on distribution model

Info

Publication number: CN113836141A
Application number: CN202111119891.9A
Authority: CN
Inventors: 张才明
Original assignee: China Institute Of Industrial Relations
Current assignee: China Institute Of Industrial Relations
Priority date: 2021-09-24
Filing date: 2021-09-24
Publication date: 2021-12-24
Anticipated expiration: 2041-09-24
Also published as: CN113836141B

Abstract

The invention discloses a big data cross indexing method based on a distribution model, which comprises the following steps: acquiring data to be analyzed and processed and a data type corresponding to the data to be analyzed and processed; performing cluster analysis on the data to be analyzed and processed according to the data types to obtain a plurality of classification sets; and establishing a distribution model based on a cross index technology according to the classification sets. All dimensions are associated with other dimensions in a recursion mode, the relationship between the dimensions is organized, the cross index is quickly established among the dimensions, the efficiency and the speed are greatly improved, the quick and efficient query and analysis are realized, and the occupied resources are greatly reduced.

Description

Big data cross indexing method based on distribution model

Technical Field

The invention relates to the technical field of big data processing, in particular to a big data cross indexing method based on a distribution model.

Background

With the advent of the era of big data, big data technology has been rapidly developed, wherein the most typical change is rich and diverse computing modes, the computing mode is developed from initial batch processing computing to subsequent stream computing, real-time interactive computing and the like. However, application scenes of different computing frames are limited, and although batch computing can easily process mass data, the response time is long; unlike batch processing, streaming computing is a mode of continuous computing that can respond quickly to user events; real-time interactive computing achieves big data processing in an interactive manner and also has fast response speed. As big data application scenes are more and more complex, the traditional single computing framework mode cannot well meet the requirements of data application. Some research is now emerging on hybrid systems aimed at fusing multiple computing systems, unifying large data computing platforms to provide multiple computing services.

The concept of online analytical processing was first proposed by the parent e.f. codd of relational databases in 1993. Codd considers that online transaction processing cannot meet the requirements of end users on database query analysis, and SQL simply queries a large-capacity database cannot meet the requirements of user analysis. The decision analysis of the user needs to perform a large amount of calculation on the relational database to obtain the result, and the query result cannot meet the requirements of the decision maker. Thus, Codd proposes the concept of multidimensional databases and multidimensional analysis, i.e., OLAP. The definition of the OLAP Committee for on-line analysis processing is: the method enables an analyst, a manager or an executor to quickly, consistently and interactively access the information which is converted from the original data, can be really understood by a user and really reflects the enterprise dimensional characteristics from various angles, thereby obtaining a class of software technology which can deeply understand the data.

The on-line analysis processing is mainly characterized in that a multi-dimensional distribution model is established for a user in advance by directly imitating a multi-angle thinking mode of the user, wherein dimension refers to an analysis angle of the user. For example, analysis of sales data, the time period is a dimension, and the product category, distribution channel, geographical distribution, and customer group are also dimensions, respectively. Once the multi-dimensional distribution model is established, a user can quickly acquire data from each analysis angle, and can dynamically switch among the angles or perform multi-angle comprehensive analysis, so that the method has great analysis flexibility. This is also the root cause of the widespread concern of online analytical processing, which is fundamentally different from older data processing systems both in design concept and in real implementation.

Under the condition of large data volume, real-time aggregation, group by, sum, count and other operations are carried out after table association, traditional big data analysis and calculation are based on database indexes, needed data fields are subjected to summary calculation, relations among the fields are free of any processing and cache, based on performance bottlenecks of different databases, long response time can be needed, join association operation among the tables is carried out on a Hadoop framework particularly, efficiency is very low, calculation resources are occupied greatly, and the requirements of big data calculation and analysis cannot be met in real time in a big data era.

Disclosure of Invention

The present invention is directed to solving, at least to some extent, one of the technical problems in the art described above. Therefore, the invention aims to provide a big data cross indexing method based on a distribution model, which relates all dimensions with other dimensions in a recursive mode, organizes the relationship between the dimensions, quickly establishes cross indexes among the dimensions, greatly improves the efficiency and speed, realizes quick and efficient query and analysis, and greatly reduces the occupied resources.

In order to achieve the above object, an embodiment of the present invention provides a big data cross indexing method based on a distribution model, including:

acquiring data to be analyzed and processed and a data type corresponding to the data to be analyzed and processed;

performing cluster analysis on the data to be analyzed and processed according to the data types to obtain a plurality of classification sets;

and establishing a distribution model based on a cross index technology according to the classification sets.

According to some embodiments of the invention, the building a distribution model based on a cross-indexing technique according to the plurality of classification sets comprises:

dividing the data to be analyzed and processed in each classification set into a dimension field for analysis, an information field for describing the dimension and a summary field for statistical analysis;

and establishing a distribution model according to the dimension field, the information field and the abstract field.

According to some embodiments of the invention, the method for acquiring the dimension field comprises:

acquiring an intra-class association relation between different data to be analyzed and processed in each classification set;

acquiring inter-class association relation among data to be analyzed and processed among different classification sets;

and determining the dimension field based on a cross-indexing technology according to the intra-class association relationship and the inter-class association relationship.

According to some embodiments of the invention, the building a distribution model according to the dimension field, the information field, and the summary field comprises:

performing calculation modification before modeling on the abstract field based on a calculation function;

establishing a description script based on the information field, and operating a designed operation program in the description script to perform modeling;

in the modeling process, the analysis dimensions included in the dimension field are assisted by using a cross index technology to accelerate the access speed of data volume, and finally a distribution model is established.

According to some embodiments of the invention, the distribution model comprises specific data and is stored in the form of a model file, the model file further comprising data analysis statistics.

According to some embodiments of the invention, further comprising managing the established distribution model, comprising:

obtaining the dimensionality of the distribution model, determining the distribution model as a first class model when the dimensionality is determined to be larger than a preset dimensionality, classifying the dimensionality in the first class model, and converting an information field in the first class model into a dynamic dimensionality;

or

Performing timing extraction and modeling on the incremental data, taking a distribution model determined by the incremental data as a second-class model, acquiring a historically-created distribution model, and longitudinally combining the second-class model and the historically-created distribution model;

or

The method comprises the steps of obtaining the number of the existing distribution models, obtaining dimension fields of the existing distribution models when the number is determined to be larger than the preset number, establishing an incidence relation of the dimension fields among the existing distribution models, matching the existing distribution models based on an incidence query function according to the incidence relation, and achieving transverse combination of the existing distribution models.

According to some embodiments of the present invention, after the established distribution model is managed, the management operation is generated into a management file of the index path, and the management file is used for storing the logical relationship of the management operation and does not store specific data.

According to some embodiments of the invention, the distribution model employs a star architecture and a binary-based data management schema.

According to some embodiments of the invention, further comprising:

after the distribution model is established, determining the distribution model as a target distribution model, and acquiring first model information of the target distribution model;

acquiring a plurality of historical distribution models and second model information of the plurality of historical distribution models;

determining the association degree between the target distribution model and the plurality of historical distribution models according to the first model information and the second model information, and sequencing from high to low according to the association degree to determine a queuing queue of the historical distribution models;

performing service processing according to the target distribution model, determining participation frequency participating in the service processing within a preset time period according to the model identification of the target distribution model in the service processing process, comparing the participation frequency with the preset participation frequency, and determining whether the service processing performed by the target distribution model is legal or not according to the comparison result;

when the target distribution model is determined to be legal to perform service processing, determining target data, and determining the authority level of the target distribution model when a data request is performed according to the target data;

determining a target association degree according to the authority level, determining a corresponding historical distribution model in the queuing queue according to the target association degree, taking the corresponding historical distribution model and the historical distribution model behind the corresponding historical distribution model as data supplier distribution models, and taking the corresponding historical distribution model as a first data supplier distribution model;

generating data request information, and sending the data request information to a first data supplier distribution model;

feedback information returned by the first data supplier distribution model is obtained, the data request information is updated according to the feedback information, and the updated data request information is sent to a second data supplier distribution model;

repeating the steps until target data are generated according to the obtained plurality of feedback information;

and performing service processing according to the target data and the target distribution model.

According to some embodiments of the invention, further comprising:

establishing a data matrix of the distribution model, and determining an evaluation index of the distribution model according to the data matrix;

establishing an incidence relation between evaluation indexes to generate an evaluation index system;

setting the weight and calculation parameters of the evaluation indexes in the evaluation index system;

comprehensively evaluating the distribution model according to the evaluation index system, calculating to obtain an evaluation value, and judging whether the evaluation value is smaller than a preset evaluation value; and when the evaluation value is determined to be smaller than a preset evaluation value, reconstructing the distribution model.

Has the advantages that:

1. the array-based multi-dimensional cross-indexing technology can convert a large amount of different source data into a highly optimized model, and is an ideal choice for reports, data analysis and data visualization. By using all possible query paths of data index, no matter how many dimensions the model has or how complex the operation is, the unique distribution model can still quickly and persistently react to analytic queries, reports, dashboards and visualizations.

2. The powerful and flexible distribution model enables users to "dive" from one dimension to another arbitrary dimension without having a full understanding of the underlying data structure. The multidimensional distribution model utilizes an improved star architecture and a binary-based data management schema. Such data management structures eliminate data hierarchy or other data navigation limitations as is known in the industry and in conventional data processing software. Typically, maintaining a special data warehouse, traditional relational or multidimensional online analysis techniques (OLAP) incurs specialized costs, and the need for such costs may be reduced based on a star-like structure. The overhead incurred in creating the data warehouse can be minimized or even directly eliminated based on the unique distribution model and the consolidated information component.

3. The distribution model of the technology and the unique application of the distribution model to the memory technology achieve the purpose of continuously obtaining the user experience with quick response and are not influenced by the size of the potential data volume. The present technology implementation is widely recognized in its overall low acquisition cost, efficient value generation, and rapid layout.

4. By virtue of the flexible integrated platform functionality, the present technique is particularly fast to implement relative to conventional data processing techniques. The technology occupies less resources due to the following three points that firstly, the additional database cost consumption brought by the traditional OLAP tool is avoided; second, existing data storage and operating systems are utilized; thirdly, the data volume in the memory is coordinated, and the performance is pursued to be maximized. The present technique does not require the purchase or authorization of any primary databases, nor does it require the construction costs of data warehouses that are expensive and prior to value generation. Conventional OLAP-based BI techniques typically require a primary database or data warehouse to build and distribute a data cube or specialized data warehouse, even if all prior to data analysis and report generation. This implicit cost presents a significant barrier to businesses that would otherwise wish to avoid the overhead of traditional BI techniques or professional analytics applications, which in turn directly seek reporting and data analytics functions.

5. The rapid addition of new data sources does not add to the cost of the solution itself. The addition of new users is simple and easy, the number of users can be expanded simply and rapidly along with the expansion of deployment, and benefits brought by the BI are distributed to all corners of an enterprise. In a relatively short time, the technology can obtain a lot of convenience, such as increased income, greatly saved cost, reduced working time spent on reports, smoother customer contact and the like, and in addition, more successful decisions based on data are also made in the aspects of investment objects and integrated sales programs.

6. The technique achieves optimal data balance in memory, so it maximizes system performance, and the user can react continuously and quickly. The rapid deployment and performance maximization enable a user to really experience 'big data value', and value is added to an enterprise at a higher speed.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and drawings.

The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:

FIG. 1 is a flow chart of a big data cross-indexing method based on a distribution model according to a first embodiment of the present invention;

FIG. 2 is a flow chart of a big data cross-indexing method based on a distribution model according to a second embodiment of the present invention;

FIG. 3 is a flowchart of a big data cross-indexing method based on a distribution model according to a third embodiment of the present invention;

FIG. 4 is a schematic diagram of the logical structure in a model file according to a first embodiment of the invention;

FIG. 5 is a schematic diagram of the logical structure in a model file according to a second embodiment of the present invention;

fig. 6 is a schematic diagram of a logical structure in a model file according to a third embodiment of the present invention.

Detailed Description

The preferred embodiments of the present invention will be described in conjunction with the accompanying drawings, and it will be understood that they are described herein for the purpose of illustration and explanation and not limitation.

As shown in fig. 1, an embodiment of the present invention provides a big data cross-indexing method based on a distribution model, including steps S1-S3:

s1, acquiring data to be analyzed and processed and a data type corresponding to the data to be analyzed and processed;

s2, performing cluster analysis on the data to be analyzed and processed according to the data types to obtain a plurality of classification sets;

and S3, establishing a distribution model based on a cross index technology according to the classification sets.

The working principle of the technical scheme is as follows: acquiring data to be analyzed and processed and a data type corresponding to the data to be analyzed and processed; performing cluster analysis on the data to be analyzed and processed according to the data types to obtain a plurality of classification sets; and establishing a distribution model based on a cross index technology according to the classification sets. A distribution model is established for data needing analysis and processing, and association and sneak among any dimensions are realized through free cross index processing of the data to acquire the most needed information, namely multi-dimensional cross analysis. Free intersection between data from disparate data sources, the user obtains efficient processing information through modeling. The distribution model is also the data model.

The beneficial effects of the above technical scheme are that: all dimensions are associated with other dimensions in a recursion mode, the relationship between the dimensions is organized, the cross index is quickly established among the dimensions, the efficiency and the speed are greatly improved, the quick and efficient query and analysis are realized, and the occupied resources are greatly reduced.

As shown in fig. 2, according to some embodiments of the present invention, the building a distribution model based on a cross-indexing technique according to the plurality of classification sets includes:

s31, dividing the data to be analyzed and processed in each classification set into a dimension field for analysis, an information field for describing the dimension and a summary field for statistical analysis;

and S32, establishing a distribution model according to the dimension field, the information field and the abstract field.

The working principle of the technical scheme is as follows: establishing a model for data, wherein all information is divided into three components: a dimension field for analysis, an information field for describing the dimension, and a summary field for statistical analysis. And establishing a distribution model according to the dimension field, the information field and the abstract field. The distribution model contains 3 data fields: dimension, summary, and information fields. Fields defined as dimensions in the modeling are subjected to cross-indexing processing, and the most needed information can be obtained by quickly performing mutual sneak between any dimension and any dimension. The fields in the data that are used for mathematical statistics are defined as digests. The information field contains additional information related to the dimension.

The beneficial effects of the above technical scheme are that: the data of the data sources of different types can be freely crossed, and the accuracy of the established distribution model is ensured.

The working principle of the technical scheme is as follows: acquiring an intra-class association relation between different data to be analyzed and processed in each classification set; acquiring inter-class association relation among data to be analyzed and processed among different classification sets; and determining the dimension field based on a cross-indexing technology according to the intra-class association relationship and the inter-class association relationship.

The beneficial effects of the above technical scheme are that: and ensuring the accuracy of determining the dimension field according to the intra-class association relationship and the inter-class association relationship.

As shown in fig. 3, according to some embodiments of the present invention, the building a distribution model according to the dimension field, the information field, and the summary field includes:

s321, calculating and modifying the abstract field before modeling based on a calculation function;

s322, establishing a description script based on the information field, and operating a designed operation program in the description script to perform modeling;

s323, in the modeling process, the analysis dimensionality included by the dimensionality field is assisted by using a cross index technology to accelerate the access speed of the data volume, and finally, a distribution model is established.

The working principle of the technical scheme is as follows: a number of computational functions are provided to make computational modifications before modeling some summary fields. And establishing a description script based on the information field, wherein when the user designs the operations in the description script, the program runs the script to perform modeling, and a cross index technology is used for analysis dimensions included in the dimension field in the modeling process so as to accelerate the access speed of the data volume and finally establish a distribution model.

The beneficial effects of the above technical scheme are that: the accuracy of the established distribution model is improved.

The working principle and the beneficial effects of the technical scheme are as follows: the distribution model comprises specific data and is stored in a model file form, the data analysis statistics of the user is based on the data model file, a data warehouse mode in a database form is separated, and the cost of data processing based on the database in the prior art is saved.

In the data model file, the program has analyzed all designed dimensions and organized the dimensions and the relationships between the dimensions. Each dimension is recursively associated with the other dimensions and cross-indexed, so the model file size is larger than the original data file, depending on the dimensions and amount of data the user has designed.

As shown in fig. 4-6, the logical structure in the data model file is specifically shown, in fig. 4, data inside the "specification" table is firstly queried, then the data in the specification table, for example, "listen" to connect other tables of clients, products and invoices, and further, on the other layer, "Dan" based on the "listen-client-abstract table" to connect the "listen-Dan-product abstract table" to implement continuous cross indexing of each dimensional part until all data are associated with each other. Fig. 5-6 illustrate the same principle.

The model files may be stored on a plurality of separate devices in a decentralized manner.

or

The working principle and the beneficial effects of the technical scheme are as follows: obtaining the dimensionality of the distribution model, determining the distribution model as a first class model when the dimensionality is determined to be larger than a preset dimensionality, classifying the dimensionality in the first class model, and converting an information field in the first class model into a dynamic dimensionality; namely, for the distribution model with multiple dimensions and complexity, the dimensions in the distribution model can be classified to facilitate the analysis sneak-in of the user, and the information fields in the distribution model can be converted into "dynamic dimensions", and the user can also perform the sneak-in analysis on the "dynamic dimensions". Performing timing extraction and modeling on the incremental data, taking a distribution model determined by the incremental data as a second-class model, acquiring a historically-created distribution model, and longitudinally combining the second-class model and the historically-created distribution model; and a new model does not need to be established again, so that the time and the cost are saved, and the problem of data increment concerned by a user can be solved. The method comprises the steps of obtaining the number of the existing distribution models, obtaining dimension fields of the existing distribution models when the number is determined to be larger than the preset number, establishing the incidence relation of the dimension fields among the existing distribution models, matching the existing distribution models based on an incidence query function according to the incidence relation, and achieving transverse combination of the existing distribution models, so that a user is helped to increase more analysis points for data mining analysis.

The beneficial effects of the above technical scheme are that: the method and the device improve effective management of the management file, save the memory and improve the management efficiency of the distribution model.

According to some embodiments of the invention, further comprising:

The working principle of the technical scheme is as follows: after the distribution model is established, determining the distribution model as a target distribution model, and acquiring first model information of the target distribution model; the first model information comprises the type, the dimension field and the like of the data to be analyzed and processed for constructing the target distribution model. Acquiring a plurality of historical distribution models and second model information of the plurality of historical distribution models; determining the association degree between the target distribution model and the plurality of historical distribution models according to the first model information and the second model information, and sequencing from high to low according to the association degree to determine a queuing queue of the historical distribution models; performing service processing according to the target distribution model, determining participation frequency participating in the service processing within a preset time period according to the model identification of the target distribution model in the service processing process, comparing the participation frequency with the preset participation frequency, and determining whether the service processing performed by the target distribution model is legal or not according to the comparison result; and when the participation frequency is determined to be less than or equal to the preset frequency, the target distribution model is legal to perform service processing. When the target distribution model is determined to be legal to perform service processing, determining target data, and determining the authority level of the target distribution model when a data request is performed according to the target data; the target data is the data lacking in the business processing of the target distribution model. Determining a target association degree according to the authority level, determining a corresponding historical distribution model in the queuing queue according to the target association degree, taking the corresponding historical distribution model and the historical distribution model behind the corresponding historical distribution model as data supplier distribution models, and taking the corresponding historical distribution model as a first data supplier distribution model; generating data request information, and sending the data request information to a first data supplier distribution model; feedback information returned by the first data supplier distribution model is obtained, the data request information is updated according to the feedback information, and the updated data request information is sent to a second data supplier distribution model; repeating the steps until target data are generated according to the obtained plurality of feedback information; and performing service processing according to the target data and the target distribution model.

The beneficial effects of the above technical scheme are that: when the business processing is carried out based on the target distribution model, the queuing queue of the historical distribution model is firstly determined, so that the relevant data can be called from the historical distribution model when the business processing is carried out. And judging whether the target distribution model is legal or not, so that the safety of business processing based on the target distribution model can be improved, the safety of data is ensured, and the data is prevented from being stolen. And determining the authority level of the target distribution model when the data request is carried out based on the target data, and further determining the target association degree without involving data of higher level, so that the security of the data is ensured, and the data acquisition levels of different levels are realized. The data request information is continuously updated according to the feedback information, the data request information is guaranteed to be more targeted, the data supplier distribution model quickly and accurately determines the feedback information according to the updated data request information, the time for obtaining the target data is shortened, the service processing is quickly carried out according to the target data and the target distribution model, and the efficiency and the accuracy of the service processing are improved.

According to some embodiments of the invention, further comprising:

The working principle of the technical scheme is as follows: establishing a data matrix of the distribution model, and determining an evaluation index of the distribution model according to the data matrix; establishing an incidence relation between evaluation indexes to generate an evaluation index system; setting the weight and calculation parameters of the evaluation indexes in the evaluation index system; comprehensively evaluating the distribution model according to the evaluation index system, calculating to obtain an evaluation value, and judging whether the evaluation value is smaller than a preset evaluation value; and when the evaluation value is determined to be smaller than a preset evaluation value, reconstructing the distribution model. The data matrix is obtained by abstracting the data to be analyzed and processed included on the basis of the distribution model, so that effective data in the distribution model can be accurately and comprehensively displayed, and the problem of missing of the effective data during comprehensive evaluation is avoided. The evaluation indexes comprise data quality indexes, integrity indexes, redundancy indexes and the like.

The beneficial effects of the above technical scheme are that: the reasonability of the established distribution model is conveniently and accurately verified, and when the distribution model is determined to be unreasonable, reconstruction is carried out, so that the accuracy of data processing is ensured.

In an embodiment, matching between existing distribution models based on an association query function according to the association relationship to implement horizontal merging of the existing distribution models includes:

carrying out image fusion processing on a first image included in a first existing distribution model and a second image included in a second existing distribution model to obtain a fused image;

and evaluating the fused image, calculating to obtain an evaluation value, inquiring a preset evaluation value-fusion quality grade table according to the evaluation value to obtain the fusion quality grades of the first existing distribution model and the second existing distribution model, and re-fusing the first existing distribution model and the second existing distribution model when the fusion quality grade is determined to be smaller than the preset fusion quality grade.

The evaluating the fused image and calculating to obtain an evaluation value S comprises:

wherein M is the length of the fused image; n is the width of the fused image; f (i, j) is the pixel value at (i, j) on the fused image; f (i, j-1) is the pixel value at (i, j-1) on the fused image; f (i-1, j) is the pixel value at (i-1, j) on the fused image.

The working principle and the beneficial effects of the technical scheme are as follows: carrying out image fusion processing on a first image included in a first existing distribution model and a second image included in a second existing distribution model to obtain a fused image; and evaluating the fused image, calculating to obtain an evaluation value, inquiring a preset evaluation value-fusion quality grade table according to the evaluation value to obtain the fusion quality grades of the first existing distribution model and the second existing distribution model, and re-fusing the first existing distribution model and the second existing distribution model when the fusion quality grade is determined to be smaller than the preset fusion quality grade. The fusion quality of the existing distribution model is convenient to guarantee, the utilization rate and the accuracy of data are improved, and the problem of data loss in the fusion process is avoided. And accurately calculating the evaluation value of the fused image based on the formula, thereby ensuring the accuracy of the fusion quality grade obtained by query.

Has the advantages that:

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. A big data cross indexing method based on a distribution model is characterized by comprising the following steps:

2. The big data cross-indexing method based on the distribution model according to claim 1, wherein the building the distribution model based on the cross-indexing technology according to the plurality of classification sets comprises:

3. The big data cross-indexing method based on the distribution model according to claim 2, wherein the dimension field obtaining method comprises:

4. The big data cross-indexing method based on the distribution model according to claim 2, wherein the building the distribution model according to the dimension field, the information field and the summary field comprises:

5. The big data cross-indexing method based on the distribution model according to claim 1, wherein the distribution model includes specific data and is stored in a model file, and the model file further includes data analysis statistical information.

6. The big data cross-indexing method based on the distribution model according to claim 1, further comprising managing the established distribution model, including:

or

7. The big data cross indexing method based on the distribution model as claimed in claim 6, wherein after the established distribution model is managed, the management operation is generated into a management file of the index path, and the management file is used for storing the logical relationship of the management operation without storing specific data.

8. The big data cross-indexing method based on the distribution model of claim 1, wherein the distribution model adopts a star-type framework and a binary-based data management mode.

9. The big data cross-indexing method based on the distribution model according to claim 1, further comprising:

10. The big data cross-indexing method based on the distribution model according to claim 1, further comprising: