CN113836141A - Big data cross indexing method based on distribution model - Google Patents

Big data cross indexing method based on distribution model Download PDF

Info

Publication number
CN113836141A
CN113836141A CN202111119891.9A CN202111119891A CN113836141A CN 113836141 A CN113836141 A CN 113836141A CN 202111119891 A CN202111119891 A CN 202111119891A CN 113836141 A CN113836141 A CN 113836141A
Authority
CN
China
Prior art keywords
data
distribution model
model
distribution
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111119891.9A
Other languages
Chinese (zh)
Other versions
CN113836141B (en
Inventor
张才明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Institute Of Industrial Relations
Original Assignee
China Institute Of Industrial Relations
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Institute Of Industrial Relations filed Critical China Institute Of Industrial Relations
Priority to CN202111119891.9A priority Critical patent/CN113836141B/en
Publication of CN113836141A publication Critical patent/CN113836141A/en
Application granted granted Critical
Publication of CN113836141B publication Critical patent/CN113836141B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Development Economics (AREA)
  • Data Mining & Analysis (AREA)
  • Educational Administration (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Business, Economics & Management (AREA)
  • Game Theory and Decision Science (AREA)
  • Tourism & Hospitality (AREA)
  • Software Systems (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • Marketing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a big data cross indexing method based on a distribution model, which comprises the following steps: acquiring data to be analyzed and processed and a data type corresponding to the data to be analyzed and processed; performing cluster analysis on the data to be analyzed and processed according to the data types to obtain a plurality of classification sets; and establishing a distribution model based on a cross index technology according to the classification sets. All dimensions are associated with other dimensions in a recursion mode, the relationship between the dimensions is organized, the cross index is quickly established among the dimensions, the efficiency and the speed are greatly improved, the quick and efficient query and analysis are realized, and the occupied resources are greatly reduced.

Description

Big data cross indexing method based on distribution model
Technical Field
The invention relates to the technical field of big data processing, in particular to a big data cross indexing method based on a distribution model.
Background
With the advent of the era of big data, big data technology has been rapidly developed, wherein the most typical change is rich and diverse computing modes, the computing mode is developed from initial batch processing computing to subsequent stream computing, real-time interactive computing and the like. However, application scenes of different computing frames are limited, and although batch computing can easily process mass data, the response time is long; unlike batch processing, streaming computing is a mode of continuous computing that can respond quickly to user events; real-time interactive computing achieves big data processing in an interactive manner and also has fast response speed. As big data application scenes are more and more complex, the traditional single computing framework mode cannot well meet the requirements of data application. Some research is now emerging on hybrid systems aimed at fusing multiple computing systems, unifying large data computing platforms to provide multiple computing services.
The concept of online analytical processing was first proposed by the parent e.f. codd of relational databases in 1993. Codd considers that online transaction processing cannot meet the requirements of end users on database query analysis, and SQL simply queries a large-capacity database cannot meet the requirements of user analysis. The decision analysis of the user needs to perform a large amount of calculation on the relational database to obtain the result, and the query result cannot meet the requirements of the decision maker. Thus, Codd proposes the concept of multidimensional databases and multidimensional analysis, i.e., OLAP. The definition of the OLAP Committee for on-line analysis processing is: the method enables an analyst, a manager or an executor to quickly, consistently and interactively access the information which is converted from the original data, can be really understood by a user and really reflects the enterprise dimensional characteristics from various angles, thereby obtaining a class of software technology which can deeply understand the data.
The on-line analysis processing is mainly characterized in that a multi-dimensional distribution model is established for a user in advance by directly imitating a multi-angle thinking mode of the user, wherein dimension refers to an analysis angle of the user. For example, analysis of sales data, the time period is a dimension, and the product category, distribution channel, geographical distribution, and customer group are also dimensions, respectively. Once the multi-dimensional distribution model is established, a user can quickly acquire data from each analysis angle, and can dynamically switch among the angles or perform multi-angle comprehensive analysis, so that the method has great analysis flexibility. This is also the root cause of the widespread concern of online analytical processing, which is fundamentally different from older data processing systems both in design concept and in real implementation.
Under the condition of large data volume, real-time aggregation, group by, sum, count and other operations are carried out after table association, traditional big data analysis and calculation are based on database indexes, needed data fields are subjected to summary calculation, relations among the fields are free of any processing and cache, based on performance bottlenecks of different databases, long response time can be needed, join association operation among the tables is carried out on a Hadoop framework particularly, efficiency is very low, calculation resources are occupied greatly, and the requirements of big data calculation and analysis cannot be met in real time in a big data era.
Disclosure of Invention
The present invention is directed to solving, at least to some extent, one of the technical problems in the art described above. Therefore, the invention aims to provide a big data cross indexing method based on a distribution model, which relates all dimensions with other dimensions in a recursive mode, organizes the relationship between the dimensions, quickly establishes cross indexes among the dimensions, greatly improves the efficiency and speed, realizes quick and efficient query and analysis, and greatly reduces the occupied resources.
In order to achieve the above object, an embodiment of the present invention provides a big data cross indexing method based on a distribution model, including:
acquiring data to be analyzed and processed and a data type corresponding to the data to be analyzed and processed;
performing cluster analysis on the data to be analyzed and processed according to the data types to obtain a plurality of classification sets;
and establishing a distribution model based on a cross index technology according to the classification sets.
According to some embodiments of the invention, the building a distribution model based on a cross-indexing technique according to the plurality of classification sets comprises:
dividing the data to be analyzed and processed in each classification set into a dimension field for analysis, an information field for describing the dimension and a summary field for statistical analysis;
and establishing a distribution model according to the dimension field, the information field and the abstract field.
According to some embodiments of the invention, the method for acquiring the dimension field comprises:
acquiring an intra-class association relation between different data to be analyzed and processed in each classification set;
acquiring inter-class association relation among data to be analyzed and processed among different classification sets;
and determining the dimension field based on a cross-indexing technology according to the intra-class association relationship and the inter-class association relationship.
According to some embodiments of the invention, the building a distribution model according to the dimension field, the information field, and the summary field comprises:
performing calculation modification before modeling on the abstract field based on a calculation function;
establishing a description script based on the information field, and operating a designed operation program in the description script to perform modeling;
in the modeling process, the analysis dimensions included in the dimension field are assisted by using a cross index technology to accelerate the access speed of data volume, and finally a distribution model is established.
According to some embodiments of the invention, the distribution model comprises specific data and is stored in the form of a model file, the model file further comprising data analysis statistics.
According to some embodiments of the invention, further comprising managing the established distribution model, comprising:
obtaining the dimensionality of the distribution model, determining the distribution model as a first class model when the dimensionality is determined to be larger than a preset dimensionality, classifying the dimensionality in the first class model, and converting an information field in the first class model into a dynamic dimensionality;
or
Performing timing extraction and modeling on the incremental data, taking a distribution model determined by the incremental data as a second-class model, acquiring a historically-created distribution model, and longitudinally combining the second-class model and the historically-created distribution model;
or
The method comprises the steps of obtaining the number of the existing distribution models, obtaining dimension fields of the existing distribution models when the number is determined to be larger than the preset number, establishing an incidence relation of the dimension fields among the existing distribution models, matching the existing distribution models based on an incidence query function according to the incidence relation, and achieving transverse combination of the existing distribution models.
According to some embodiments of the present invention, after the established distribution model is managed, the management operation is generated into a management file of the index path, and the management file is used for storing the logical relationship of the management operation and does not store specific data.
According to some embodiments of the invention, the distribution model employs a star architecture and a binary-based data management schema.
According to some embodiments of the invention, further comprising:
after the distribution model is established, determining the distribution model as a target distribution model, and acquiring first model information of the target distribution model;
acquiring a plurality of historical distribution models and second model information of the plurality of historical distribution models;
determining the association degree between the target distribution model and the plurality of historical distribution models according to the first model information and the second model information, and sequencing from high to low according to the association degree to determine a queuing queue of the historical distribution models;
performing service processing according to the target distribution model, determining participation frequency participating in the service processing within a preset time period according to the model identification of the target distribution model in the service processing process, comparing the participation frequency with the preset participation frequency, and determining whether the service processing performed by the target distribution model is legal or not according to the comparison result;
when the target distribution model is determined to be legal to perform service processing, determining target data, and determining the authority level of the target distribution model when a data request is performed according to the target data;
determining a target association degree according to the authority level, determining a corresponding historical distribution model in the queuing queue according to the target association degree, taking the corresponding historical distribution model and the historical distribution model behind the corresponding historical distribution model as data supplier distribution models, and taking the corresponding historical distribution model as a first data supplier distribution model;
generating data request information, and sending the data request information to a first data supplier distribution model;
feedback information returned by the first data supplier distribution model is obtained, the data request information is updated according to the feedback information, and the updated data request information is sent to a second data supplier distribution model;
repeating the steps until target data are generated according to the obtained plurality of feedback information;
and performing service processing according to the target data and the target distribution model.
According to some embodiments of the invention, further comprising:
establishing a data matrix of the distribution model, and determining an evaluation index of the distribution model according to the data matrix;
establishing an incidence relation between evaluation indexes to generate an evaluation index system;
setting the weight and calculation parameters of the evaluation indexes in the evaluation index system;
comprehensively evaluating the distribution model according to the evaluation index system, calculating to obtain an evaluation value, and judging whether the evaluation value is smaller than a preset evaluation value; and when the evaluation value is determined to be smaller than a preset evaluation value, reconstructing the distribution model.
Has the advantages that:
1. the array-based multi-dimensional cross-indexing technology can convert a large amount of different source data into a highly optimized model, and is an ideal choice for reports, data analysis and data visualization. By using all possible query paths of data index, no matter how many dimensions the model has or how complex the operation is, the unique distribution model can still quickly and persistently react to analytic queries, reports, dashboards and visualizations.
2. The powerful and flexible distribution model enables users to "dive" from one dimension to another arbitrary dimension without having a full understanding of the underlying data structure. The multidimensional distribution model utilizes an improved star architecture and a binary-based data management schema. Such data management structures eliminate data hierarchy or other data navigation limitations as is known in the industry and in conventional data processing software. Typically, maintaining a special data warehouse, traditional relational or multidimensional online analysis techniques (OLAP) incurs specialized costs, and the need for such costs may be reduced based on a star-like structure. The overhead incurred in creating the data warehouse can be minimized or even directly eliminated based on the unique distribution model and the consolidated information component.
3. The distribution model of the technology and the unique application of the distribution model to the memory technology achieve the purpose of continuously obtaining the user experience with quick response and are not influenced by the size of the potential data volume. The present technology implementation is widely recognized in its overall low acquisition cost, efficient value generation, and rapid layout.
4. By virtue of the flexible integrated platform functionality, the present technique is particularly fast to implement relative to conventional data processing techniques. The technology occupies less resources due to the following three points that firstly, the additional database cost consumption brought by the traditional OLAP tool is avoided; second, existing data storage and operating systems are utilized; thirdly, the data volume in the memory is coordinated, and the performance is pursued to be maximized. The present technique does not require the purchase or authorization of any primary databases, nor does it require the construction costs of data warehouses that are expensive and prior to value generation. Conventional OLAP-based BI techniques typically require a primary database or data warehouse to build and distribute a data cube or specialized data warehouse, even if all prior to data analysis and report generation. This implicit cost presents a significant barrier to businesses that would otherwise wish to avoid the overhead of traditional BI techniques or professional analytics applications, which in turn directly seek reporting and data analytics functions.
5. The rapid addition of new data sources does not add to the cost of the solution itself. The addition of new users is simple and easy, the number of users can be expanded simply and rapidly along with the expansion of deployment, and benefits brought by the BI are distributed to all corners of an enterprise. In a relatively short time, the technology can obtain a lot of convenience, such as increased income, greatly saved cost, reduced working time spent on reports, smoother customer contact and the like, and in addition, more successful decisions based on data are also made in the aspects of investment objects and integrated sales programs.
6. The technique achieves optimal data balance in memory, so it maximizes system performance, and the user can react continuously and quickly. The rapid deployment and performance maximization enable a user to really experience 'big data value', and value is added to an enterprise at a higher speed.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and drawings.
The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:
FIG. 1 is a flow chart of a big data cross-indexing method based on a distribution model according to a first embodiment of the present invention;
FIG. 2 is a flow chart of a big data cross-indexing method based on a distribution model according to a second embodiment of the present invention;
FIG. 3 is a flowchart of a big data cross-indexing method based on a distribution model according to a third embodiment of the present invention;
FIG. 4 is a schematic diagram of the logical structure in a model file according to a first embodiment of the invention;
FIG. 5 is a schematic diagram of the logical structure in a model file according to a second embodiment of the present invention;
fig. 6 is a schematic diagram of a logical structure in a model file according to a third embodiment of the present invention.
Detailed Description
The preferred embodiments of the present invention will be described in conjunction with the accompanying drawings, and it will be understood that they are described herein for the purpose of illustration and explanation and not limitation.
As shown in fig. 1, an embodiment of the present invention provides a big data cross-indexing method based on a distribution model, including steps S1-S3:
s1, acquiring data to be analyzed and processed and a data type corresponding to the data to be analyzed and processed;
s2, performing cluster analysis on the data to be analyzed and processed according to the data types to obtain a plurality of classification sets;
and S3, establishing a distribution model based on a cross index technology according to the classification sets.
The working principle of the technical scheme is as follows: acquiring data to be analyzed and processed and a data type corresponding to the data to be analyzed and processed; performing cluster analysis on the data to be analyzed and processed according to the data types to obtain a plurality of classification sets; and establishing a distribution model based on a cross index technology according to the classification sets. A distribution model is established for data needing analysis and processing, and association and sneak among any dimensions are realized through free cross index processing of the data to acquire the most needed information, namely multi-dimensional cross analysis. Free intersection between data from disparate data sources, the user obtains efficient processing information through modeling. The distribution model is also the data model.
The beneficial effects of the above technical scheme are that: all dimensions are associated with other dimensions in a recursion mode, the relationship between the dimensions is organized, the cross index is quickly established among the dimensions, the efficiency and the speed are greatly improved, the quick and efficient query and analysis are realized, and the occupied resources are greatly reduced.
As shown in fig. 2, according to some embodiments of the present invention, the building a distribution model based on a cross-indexing technique according to the plurality of classification sets includes:
s31, dividing the data to be analyzed and processed in each classification set into a dimension field for analysis, an information field for describing the dimension and a summary field for statistical analysis;
and S32, establishing a distribution model according to the dimension field, the information field and the abstract field.
The working principle of the technical scheme is as follows: establishing a model for data, wherein all information is divided into three components: a dimension field for analysis, an information field for describing the dimension, and a summary field for statistical analysis. And establishing a distribution model according to the dimension field, the information field and the abstract field. The distribution model contains 3 data fields: dimension, summary, and information fields. Fields defined as dimensions in the modeling are subjected to cross-indexing processing, and the most needed information can be obtained by quickly performing mutual sneak between any dimension and any dimension. The fields in the data that are used for mathematical statistics are defined as digests. The information field contains additional information related to the dimension.
The beneficial effects of the above technical scheme are that: the data of the data sources of different types can be freely crossed, and the accuracy of the established distribution model is ensured.
According to some embodiments of the invention, the method for acquiring the dimension field comprises:
acquiring an intra-class association relation between different data to be analyzed and processed in each classification set;
acquiring inter-class association relation among data to be analyzed and processed among different classification sets;
and determining the dimension field based on a cross-indexing technology according to the intra-class association relationship and the inter-class association relationship.
The working principle of the technical scheme is as follows: acquiring an intra-class association relation between different data to be analyzed and processed in each classification set; acquiring inter-class association relation among data to be analyzed and processed among different classification sets; and determining the dimension field based on a cross-indexing technology according to the intra-class association relationship and the inter-class association relationship.
The beneficial effects of the above technical scheme are that: and ensuring the accuracy of determining the dimension field according to the intra-class association relationship and the inter-class association relationship.
As shown in fig. 3, according to some embodiments of the present invention, the building a distribution model according to the dimension field, the information field, and the summary field includes:
s321, calculating and modifying the abstract field before modeling based on a calculation function;
s322, establishing a description script based on the information field, and operating a designed operation program in the description script to perform modeling;
s323, in the modeling process, the analysis dimensionality included by the dimensionality field is assisted by using a cross index technology to accelerate the access speed of the data volume, and finally, a distribution model is established.
The working principle of the technical scheme is as follows: a number of computational functions are provided to make computational modifications before modeling some summary fields. And establishing a description script based on the information field, wherein when the user designs the operations in the description script, the program runs the script to perform modeling, and a cross index technology is used for analysis dimensions included in the dimension field in the modeling process so as to accelerate the access speed of the data volume and finally establish a distribution model.
The beneficial effects of the above technical scheme are that: the accuracy of the established distribution model is improved.
According to some embodiments of the invention, the distribution model comprises specific data and is stored in the form of a model file, the model file further comprising data analysis statistics.
The working principle and the beneficial effects of the technical scheme are as follows: the distribution model comprises specific data and is stored in a model file form, the data analysis statistics of the user is based on the data model file, a data warehouse mode in a database form is separated, and the cost of data processing based on the database in the prior art is saved.
In the data model file, the program has analyzed all designed dimensions and organized the dimensions and the relationships between the dimensions. Each dimension is recursively associated with the other dimensions and cross-indexed, so the model file size is larger than the original data file, depending on the dimensions and amount of data the user has designed.
As shown in fig. 4-6, the logical structure in the data model file is specifically shown, in fig. 4, data inside the "specification" table is firstly queried, then the data in the specification table, for example, "listen" to connect other tables of clients, products and invoices, and further, on the other layer, "Dan" based on the "listen-client-abstract table" to connect the "listen-Dan-product abstract table" to implement continuous cross indexing of each dimensional part until all data are associated with each other. Fig. 5-6 illustrate the same principle.
The model files may be stored on a plurality of separate devices in a decentralized manner.
According to some embodiments of the invention, further comprising managing the established distribution model, comprising:
obtaining the dimensionality of the distribution model, determining the distribution model as a first class model when the dimensionality is determined to be larger than a preset dimensionality, classifying the dimensionality in the first class model, and converting an information field in the first class model into a dynamic dimensionality;
or
Performing timing extraction and modeling on the incremental data, taking a distribution model determined by the incremental data as a second-class model, acquiring a historically-created distribution model, and longitudinally combining the second-class model and the historically-created distribution model;
or
The method comprises the steps of obtaining the number of the existing distribution models, obtaining dimension fields of the existing distribution models when the number is determined to be larger than the preset number, establishing an incidence relation of the dimension fields among the existing distribution models, matching the existing distribution models based on an incidence query function according to the incidence relation, and achieving transverse combination of the existing distribution models.
The working principle and the beneficial effects of the technical scheme are as follows: obtaining the dimensionality of the distribution model, determining the distribution model as a first class model when the dimensionality is determined to be larger than a preset dimensionality, classifying the dimensionality in the first class model, and converting an information field in the first class model into a dynamic dimensionality; namely, for the distribution model with multiple dimensions and complexity, the dimensions in the distribution model can be classified to facilitate the analysis sneak-in of the user, and the information fields in the distribution model can be converted into "dynamic dimensions", and the user can also perform the sneak-in analysis on the "dynamic dimensions". Performing timing extraction and modeling on the incremental data, taking a distribution model determined by the incremental data as a second-class model, acquiring a historically-created distribution model, and longitudinally combining the second-class model and the historically-created distribution model; and a new model does not need to be established again, so that the time and the cost are saved, and the problem of data increment concerned by a user can be solved. The method comprises the steps of obtaining the number of the existing distribution models, obtaining dimension fields of the existing distribution models when the number is determined to be larger than the preset number, establishing the incidence relation of the dimension fields among the existing distribution models, matching the existing distribution models based on an incidence query function according to the incidence relation, and achieving transverse combination of the existing distribution models, so that a user is helped to increase more analysis points for data mining analysis.
According to some embodiments of the present invention, after the established distribution model is managed, the management operation is generated into a management file of the index path, and the management file is used for storing the logical relationship of the management operation and does not store specific data.
The beneficial effects of the above technical scheme are that: the method and the device improve effective management of the management file, save the memory and improve the management efficiency of the distribution model.
According to some embodiments of the invention, the distribution model employs a star architecture and a binary-based data management schema.
According to some embodiments of the invention, further comprising:
after the distribution model is established, determining the distribution model as a target distribution model, and acquiring first model information of the target distribution model;
acquiring a plurality of historical distribution models and second model information of the plurality of historical distribution models;
determining the association degree between the target distribution model and the plurality of historical distribution models according to the first model information and the second model information, and sequencing from high to low according to the association degree to determine a queuing queue of the historical distribution models;
performing service processing according to the target distribution model, determining participation frequency participating in the service processing within a preset time period according to the model identification of the target distribution model in the service processing process, comparing the participation frequency with the preset participation frequency, and determining whether the service processing performed by the target distribution model is legal or not according to the comparison result;
when the target distribution model is determined to be legal to perform service processing, determining target data, and determining the authority level of the target distribution model when a data request is performed according to the target data;
determining a target association degree according to the authority level, determining a corresponding historical distribution model in the queuing queue according to the target association degree, taking the corresponding historical distribution model and the historical distribution model behind the corresponding historical distribution model as data supplier distribution models, and taking the corresponding historical distribution model as a first data supplier distribution model;
generating data request information, and sending the data request information to a first data supplier distribution model;
feedback information returned by the first data supplier distribution model is obtained, the data request information is updated according to the feedback information, and the updated data request information is sent to a second data supplier distribution model;
repeating the steps until target data are generated according to the obtained plurality of feedback information;
and performing service processing according to the target data and the target distribution model.
The working principle of the technical scheme is as follows: after the distribution model is established, determining the distribution model as a target distribution model, and acquiring first model information of the target distribution model; the first model information comprises the type, the dimension field and the like of the data to be analyzed and processed for constructing the target distribution model. Acquiring a plurality of historical distribution models and second model information of the plurality of historical distribution models; determining the association degree between the target distribution model and the plurality of historical distribution models according to the first model information and the second model information, and sequencing from high to low according to the association degree to determine a queuing queue of the historical distribution models; performing service processing according to the target distribution model, determining participation frequency participating in the service processing within a preset time period according to the model identification of the target distribution model in the service processing process, comparing the participation frequency with the preset participation frequency, and determining whether the service processing performed by the target distribution model is legal or not according to the comparison result; and when the participation frequency is determined to be less than or equal to the preset frequency, the target distribution model is legal to perform service processing. When the target distribution model is determined to be legal to perform service processing, determining target data, and determining the authority level of the target distribution model when a data request is performed according to the target data; the target data is the data lacking in the business processing of the target distribution model. Determining a target association degree according to the authority level, determining a corresponding historical distribution model in the queuing queue according to the target association degree, taking the corresponding historical distribution model and the historical distribution model behind the corresponding historical distribution model as data supplier distribution models, and taking the corresponding historical distribution model as a first data supplier distribution model; generating data request information, and sending the data request information to a first data supplier distribution model; feedback information returned by the first data supplier distribution model is obtained, the data request information is updated according to the feedback information, and the updated data request information is sent to a second data supplier distribution model; repeating the steps until target data are generated according to the obtained plurality of feedback information; and performing service processing according to the target data and the target distribution model.
The beneficial effects of the above technical scheme are that: when the business processing is carried out based on the target distribution model, the queuing queue of the historical distribution model is firstly determined, so that the relevant data can be called from the historical distribution model when the business processing is carried out. And judging whether the target distribution model is legal or not, so that the safety of business processing based on the target distribution model can be improved, the safety of data is ensured, and the data is prevented from being stolen. And determining the authority level of the target distribution model when the data request is carried out based on the target data, and further determining the target association degree without involving data of higher level, so that the security of the data is ensured, and the data acquisition levels of different levels are realized. The data request information is continuously updated according to the feedback information, the data request information is guaranteed to be more targeted, the data supplier distribution model quickly and accurately determines the feedback information according to the updated data request information, the time for obtaining the target data is shortened, the service processing is quickly carried out according to the target data and the target distribution model, and the efficiency and the accuracy of the service processing are improved.
According to some embodiments of the invention, further comprising:
establishing a data matrix of the distribution model, and determining an evaluation index of the distribution model according to the data matrix;
establishing an incidence relation between evaluation indexes to generate an evaluation index system;
setting the weight and calculation parameters of the evaluation indexes in the evaluation index system;
comprehensively evaluating the distribution model according to the evaluation index system, calculating to obtain an evaluation value, and judging whether the evaluation value is smaller than a preset evaluation value; and when the evaluation value is determined to be smaller than a preset evaluation value, reconstructing the distribution model.
The working principle of the technical scheme is as follows: establishing a data matrix of the distribution model, and determining an evaluation index of the distribution model according to the data matrix; establishing an incidence relation between evaluation indexes to generate an evaluation index system; setting the weight and calculation parameters of the evaluation indexes in the evaluation index system; comprehensively evaluating the distribution model according to the evaluation index system, calculating to obtain an evaluation value, and judging whether the evaluation value is smaller than a preset evaluation value; and when the evaluation value is determined to be smaller than a preset evaluation value, reconstructing the distribution model. The data matrix is obtained by abstracting the data to be analyzed and processed included on the basis of the distribution model, so that effective data in the distribution model can be accurately and comprehensively displayed, and the problem of missing of the effective data during comprehensive evaluation is avoided. The evaluation indexes comprise data quality indexes, integrity indexes, redundancy indexes and the like.
The beneficial effects of the above technical scheme are that: the reasonability of the established distribution model is conveniently and accurately verified, and when the distribution model is determined to be unreasonable, reconstruction is carried out, so that the accuracy of data processing is ensured.
In an embodiment, matching between existing distribution models based on an association query function according to the association relationship to implement horizontal merging of the existing distribution models includes:
carrying out image fusion processing on a first image included in a first existing distribution model and a second image included in a second existing distribution model to obtain a fused image;
and evaluating the fused image, calculating to obtain an evaluation value, inquiring a preset evaluation value-fusion quality grade table according to the evaluation value to obtain the fusion quality grades of the first existing distribution model and the second existing distribution model, and re-fusing the first existing distribution model and the second existing distribution model when the fusion quality grade is determined to be smaller than the preset fusion quality grade.
The evaluating the fused image and calculating to obtain an evaluation value S comprises:
Figure BDA0003276682730000181
wherein M is the length of the fused image; n is the width of the fused image; f (i, j) is the pixel value at (i, j) on the fused image; f (i, j-1) is the pixel value at (i, j-1) on the fused image; f (i-1, j) is the pixel value at (i-1, j) on the fused image.
The working principle and the beneficial effects of the technical scheme are as follows: carrying out image fusion processing on a first image included in a first existing distribution model and a second image included in a second existing distribution model to obtain a fused image; and evaluating the fused image, calculating to obtain an evaluation value, inquiring a preset evaluation value-fusion quality grade table according to the evaluation value to obtain the fusion quality grades of the first existing distribution model and the second existing distribution model, and re-fusing the first existing distribution model and the second existing distribution model when the fusion quality grade is determined to be smaller than the preset fusion quality grade. The fusion quality of the existing distribution model is convenient to guarantee, the utilization rate and the accuracy of data are improved, and the problem of data loss in the fusion process is avoided. And accurately calculating the evaluation value of the fused image based on the formula, thereby ensuring the accuracy of the fusion quality grade obtained by query.
Has the advantages that:
1. the array-based multi-dimensional cross-indexing technology can convert a large amount of different source data into a highly optimized model, and is an ideal choice for reports, data analysis and data visualization. By using all possible query paths of data index, no matter how many dimensions the model has or how complex the operation is, the unique distribution model can still quickly and persistently react to analytic queries, reports, dashboards and visualizations.
2. The powerful and flexible distribution model enables users to "dive" from one dimension to another arbitrary dimension without having a full understanding of the underlying data structure. The multidimensional distribution model utilizes an improved star architecture and a binary-based data management schema. Such data management structures eliminate data hierarchy or other data navigation limitations as is known in the industry and in conventional data processing software. Typically, maintaining a special data warehouse, traditional relational or multidimensional online analysis techniques (OLAP) incurs specialized costs, and the need for such costs may be reduced based on a star-like structure. The overhead incurred in creating the data warehouse can be minimized or even directly eliminated based on the unique distribution model and the consolidated information component.
3. The distribution model of the technology and the unique application of the distribution model to the memory technology achieve the purpose of continuously obtaining the user experience with quick response and are not influenced by the size of the potential data volume. The present technology implementation is widely recognized in its overall low acquisition cost, efficient value generation, and rapid layout.
4. By virtue of the flexible integrated platform functionality, the present technique is particularly fast to implement relative to conventional data processing techniques. The technology occupies less resources due to the following three points that firstly, the additional database cost consumption brought by the traditional OLAP tool is avoided; second, existing data storage and operating systems are utilized; thirdly, the data volume in the memory is coordinated, and the performance is pursued to be maximized. The present technique does not require the purchase or authorization of any primary databases, nor does it require the construction costs of data warehouses that are expensive and prior to value generation. Conventional OLAP-based BI techniques typically require a primary database or data warehouse to build and distribute a data cube or specialized data warehouse, even if all prior to data analysis and report generation. This implicit cost presents a significant barrier to businesses that would otherwise wish to avoid the overhead of traditional BI techniques or professional analytics applications, which in turn directly seek reporting and data analytics functions.
5. The rapid addition of new data sources does not add to the cost of the solution itself. The addition of new users is simple and easy, the number of users can be expanded simply and rapidly along with the expansion of deployment, and benefits brought by the BI are distributed to all corners of an enterprise. In a relatively short time, the technology can obtain a lot of convenience, such as increased income, greatly saved cost, reduced working time spent on reports, smoother customer contact and the like, and in addition, more successful decisions based on data are also made in the aspects of investment objects and integrated sales programs.
6. The technique achieves optimal data balance in memory, so it maximizes system performance, and the user can react continuously and quickly. The rapid deployment and performance maximization enable a user to really experience 'big data value', and value is added to an enterprise at a higher speed.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (10)

1. A big data cross indexing method based on a distribution model is characterized by comprising the following steps:
acquiring data to be analyzed and processed and a data type corresponding to the data to be analyzed and processed;
performing cluster analysis on the data to be analyzed and processed according to the data types to obtain a plurality of classification sets;
and establishing a distribution model based on a cross index technology according to the classification sets.
2. The big data cross-indexing method based on the distribution model according to claim 1, wherein the building the distribution model based on the cross-indexing technology according to the plurality of classification sets comprises:
dividing the data to be analyzed and processed in each classification set into a dimension field for analysis, an information field for describing the dimension and a summary field for statistical analysis;
and establishing a distribution model according to the dimension field, the information field and the abstract field.
3. The big data cross-indexing method based on the distribution model according to claim 2, wherein the dimension field obtaining method comprises:
acquiring an intra-class association relation between different data to be analyzed and processed in each classification set;
acquiring inter-class association relation among data to be analyzed and processed among different classification sets;
and determining the dimension field based on a cross-indexing technology according to the intra-class association relationship and the inter-class association relationship.
4. The big data cross-indexing method based on the distribution model according to claim 2, wherein the building the distribution model according to the dimension field, the information field and the summary field comprises:
performing calculation modification before modeling on the abstract field based on a calculation function;
establishing a description script based on the information field, and operating a designed operation program in the description script to perform modeling;
in the modeling process, the analysis dimensions included in the dimension field are assisted by using a cross index technology to accelerate the access speed of data volume, and finally a distribution model is established.
5. The big data cross-indexing method based on the distribution model according to claim 1, wherein the distribution model includes specific data and is stored in a model file, and the model file further includes data analysis statistical information.
6. The big data cross-indexing method based on the distribution model according to claim 1, further comprising managing the established distribution model, including:
obtaining the dimensionality of the distribution model, determining the distribution model as a first class model when the dimensionality is determined to be larger than a preset dimensionality, classifying the dimensionality in the first class model, and converting an information field in the first class model into a dynamic dimensionality;
or
Performing timing extraction and modeling on the incremental data, taking a distribution model determined by the incremental data as a second-class model, acquiring a historically-created distribution model, and longitudinally combining the second-class model and the historically-created distribution model;
or
The method comprises the steps of obtaining the number of the existing distribution models, obtaining dimension fields of the existing distribution models when the number is determined to be larger than the preset number, establishing an incidence relation of the dimension fields among the existing distribution models, matching the existing distribution models based on an incidence query function according to the incidence relation, and achieving transverse combination of the existing distribution models.
7. The big data cross indexing method based on the distribution model as claimed in claim 6, wherein after the established distribution model is managed, the management operation is generated into a management file of the index path, and the management file is used for storing the logical relationship of the management operation without storing specific data.
8. The big data cross-indexing method based on the distribution model of claim 1, wherein the distribution model adopts a star-type framework and a binary-based data management mode.
9. The big data cross-indexing method based on the distribution model according to claim 1, further comprising:
after the distribution model is established, determining the distribution model as a target distribution model, and acquiring first model information of the target distribution model;
acquiring a plurality of historical distribution models and second model information of the plurality of historical distribution models;
determining the association degree between the target distribution model and the plurality of historical distribution models according to the first model information and the second model information, and sequencing from high to low according to the association degree to determine a queuing queue of the historical distribution models;
performing service processing according to the target distribution model, determining participation frequency participating in the service processing within a preset time period according to the model identification of the target distribution model in the service processing process, comparing the participation frequency with the preset participation frequency, and determining whether the service processing performed by the target distribution model is legal or not according to the comparison result;
when the target distribution model is determined to be legal to perform service processing, determining target data, and determining the authority level of the target distribution model when a data request is performed according to the target data;
determining a target association degree according to the authority level, determining a corresponding historical distribution model in the queuing queue according to the target association degree, taking the corresponding historical distribution model and the historical distribution model behind the corresponding historical distribution model as data supplier distribution models, and taking the corresponding historical distribution model as a first data supplier distribution model;
generating data request information, and sending the data request information to a first data supplier distribution model;
feedback information returned by the first data supplier distribution model is obtained, the data request information is updated according to the feedback information, and the updated data request information is sent to a second data supplier distribution model;
repeating the steps until target data are generated according to the obtained plurality of feedback information;
and performing service processing according to the target data and the target distribution model.
10. The big data cross-indexing method based on the distribution model according to claim 1, further comprising:
establishing a data matrix of the distribution model, and determining an evaluation index of the distribution model according to the data matrix;
establishing an incidence relation between evaluation indexes to generate an evaluation index system;
setting the weight and calculation parameters of the evaluation indexes in the evaluation index system;
comprehensively evaluating the distribution model according to the evaluation index system, calculating to obtain an evaluation value, and judging whether the evaluation value is smaller than a preset evaluation value; and when the evaluation value is determined to be smaller than a preset evaluation value, reconstructing the distribution model.
CN202111119891.9A 2021-09-24 2021-09-24 Big data cross indexing method based on distribution model Active CN113836141B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111119891.9A CN113836141B (en) 2021-09-24 2021-09-24 Big data cross indexing method based on distribution model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111119891.9A CN113836141B (en) 2021-09-24 2021-09-24 Big data cross indexing method based on distribution model

Publications (2)

Publication Number Publication Date
CN113836141A true CN113836141A (en) 2021-12-24
CN113836141B CN113836141B (en) 2022-04-19

Family

ID=78969669

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111119891.9A Active CN113836141B (en) 2021-09-24 2021-09-24 Big data cross indexing method based on distribution model

Country Status (1)

Country Link
CN (1) CN113836141B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116775621A (en) * 2023-08-23 2023-09-19 北京遥感设备研究所 Database intelligent index optimization method based on index selectivity

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040177319A1 (en) * 2002-07-16 2004-09-09 Horn Bruce L. Computer system for automatic organization, indexing and viewing of information from multiple sources
US20100250497A1 (en) * 2007-01-05 2010-09-30 Redlich Ron M Electromagnetic pulse (EMP) hardened information infrastructure with extractor, cloud dispersal, secure storage, content analysis and classification and method therefor
US20130124465A1 (en) * 2011-11-11 2013-05-16 Rockwell Automation Technologies, Inc. Integrated and scalable architecture for accessing and delivering data
US20130311864A1 (en) * 2012-05-16 2013-11-21 N. Nagaraj Intelligent and robust context based XML data parsing from spreadsheets
US20160299961A1 (en) * 2014-02-04 2016-10-13 David Allen Olsen System and method for grouping segments of data sequences into clusters
CN109522331A (en) * 2018-10-16 2019-03-26 易保互联医疗信息科技(北京)有限公司 Compartmentalization various dimensions health data processing method and medium centered on individual
US20190384863A1 (en) * 2018-06-13 2019-12-19 Stardog Union System and method for providing prediction-model-based generation of a graph data model
US20200073876A1 (en) * 2018-08-30 2020-03-05 Qliktech International Ab Scalable indexing architecture

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040177319A1 (en) * 2002-07-16 2004-09-09 Horn Bruce L. Computer system for automatic organization, indexing and viewing of information from multiple sources
US20100250497A1 (en) * 2007-01-05 2010-09-30 Redlich Ron M Electromagnetic pulse (EMP) hardened information infrastructure with extractor, cloud dispersal, secure storage, content analysis and classification and method therefor
US20130124465A1 (en) * 2011-11-11 2013-05-16 Rockwell Automation Technologies, Inc. Integrated and scalable architecture for accessing and delivering data
US20130311864A1 (en) * 2012-05-16 2013-11-21 N. Nagaraj Intelligent and robust context based XML data parsing from spreadsheets
US20160299961A1 (en) * 2014-02-04 2016-10-13 David Allen Olsen System and method for grouping segments of data sequences into clusters
US20190384863A1 (en) * 2018-06-13 2019-12-19 Stardog Union System and method for providing prediction-model-based generation of a graph data model
US20200073876A1 (en) * 2018-08-30 2020-03-05 Qliktech International Ab Scalable indexing architecture
CN109522331A (en) * 2018-10-16 2019-03-26 易保互联医疗信息科技(北京)有限公司 Compartmentalization various dimensions health data processing method and medium centered on individual

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
杨勇 等: "数字化医院商业智能系统的设计与实现", 《中国数字医学》 *
邱芸 等: "基于交叉索引技术的呼叫中心运营分析系统的研究与应用", 《电信科学》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116775621A (en) * 2023-08-23 2023-09-19 北京遥感设备研究所 Database intelligent index optimization method based on index selectivity
CN116775621B (en) * 2023-08-23 2024-01-02 北京遥感设备研究所 Database intelligent index optimization method based on index selectivity

Also Published As

Publication number Publication date
CN113836141B (en) 2022-04-19

Similar Documents

Publication Publication Date Title
Stonebraker et al. One size fits all? Part 2: Benchmarking results
US7315849B2 (en) Enterprise-wide data-warehouse with integrated data aggregation engine
US7383280B2 (en) Data transformation to maintain detailed user information in a data warehouse
CA2401348C (en) Multi-dimensional database and integrated aggregation server
US6505187B1 (en) Computing multiple order-based functions in a parallel processing database system
CN112269792B (en) Data query method, device, equipment and computer readable storage medium
US20020040639A1 (en) Analytical database system that models data to speed up and simplify data analysis
CN104820708B (en) A kind of big data clustering method and device based on cloud computing platform
CN105512167A (en) Multi-business user data managing system based on mixed database and method for same
Scabora et al. Physical data warehouse design on NoSQL databases-OLAP query processing over HBase
CN105824868A (en) Distributed type database data processing method and distributed type database system
Han et al. Scatter-gather-merge: An efficient star-join query processing algorithm for data-parallel frameworks
US6567803B1 (en) Simultaneous computation of multiple moving aggregates in a relational database management system
CN113836141B (en) Big data cross indexing method based on distribution model
CN115964374A (en) Query processing method and device based on pre-calculation scene
CN111046054A (en) Method and system for analyzing power marketing business data
CN114896285A (en) Bank flow calculation service real-time index system based on multi-dimensional intermediate state aggregation
CN113886465A (en) Big data analysis platform for automobile logistics
Balakayeva et al. Modeling the processing of a large amount of data
Kontaxakis et al. And synopses for all: A synopses data engine for extreme scale analytics-as-a-service
CN111752994B (en) Game digital asset management method, system, storage medium and computing device
Chen et al. Taming the Big Data Monster: Managing Petabytes of Data with Multi-Model Databases
CN114385669A (en) Data online analysis method, system, client and server
Zhang Personalized tourism information push algorithm based on smart tourism big data analysis model
Yang et al. Research on Simulation of Automobile Brand Power Score Based on Spark Framework

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant