CN111008235A - Spark-based small file merging method and system - Google Patents

Spark-based small file merging method and system Download PDF

Info

Publication number
CN111008235A
CN111008235A CN201911216907.0A CN201911216907A CN111008235A CN 111008235 A CN111008235 A CN 111008235A CN 201911216907 A CN201911216907 A CN 201911216907A CN 111008235 A CN111008235 A CN 111008235A
Authority
CN
China
Prior art keywords
data
merging
file
spark
files
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911216907.0A
Other languages
Chinese (zh)
Inventor
查文宇
张艳清
王纯斌
赵神州
费滔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Sefon Software Co Ltd
Original Assignee
Chengdu Sefon Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Sefon Software Co Ltd filed Critical Chengdu Sefon Software Co Ltd
Priority to CN201911216907.0A priority Critical patent/CN111008235A/en
Publication of CN111008235A publication Critical patent/CN111008235A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/252Integrating or interfacing systems involving database management systems between a Database Management System and a front-end application
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems

Abstract

The invention discloses a small file merging method and a small file merging system based on Spark, which can reduce the scattering quantity of small files by regularly merging small file tasks and merging a plurality of files in a plurality of partitions into 1 file according to task rules, and can reduce the reading load of a disk, the network transmission consumption, the data merging and other processes when inquiring data in a Hive library so as to improve the data inquiry efficiency. The problems that in the existing scheme, data in a source database is extracted into a Hive database, Spark simultaneously reads the data in the source database by a plurality of tasks, and the data is written into different partitions, so that the reading of a disk is multiplied, and the data query performance is reduced are solved.

Description

Spark-based small file merging method and system
Technical Field
The invention relates to the field of business intelligent analysis platforms, in particular to a Spark-based small file merging method and system.
Background
Business Intelligence (Business Intelligence, BI for short), also known as Business Intelligence or Business Intelligence, refers to the realization of Business value by data analysis using modern data warehouse technology, on-line analysis and processing technology, data mining and data presentation technology.
Business intelligence is generally understood as a tool that translates data existing in an enterprise into knowledge, helping the enterprise make informed business decisions. Data referred to herein includes orders, inventory, transaction accounts, customers and suppliers from the business and competitors of the enterprise business system, and various data from other external environments of the enterprise. Business operation decisions which can be assisted by business intelligence can be decisions of an operation layer, a tactical layer and a strategic layer. To convert data into knowledge, techniques such as data warehousing, online analytical processing (OLAP) tools, and data mining are required. Therefore, from a technical level, the business intelligence is not a new technology, and is only a comprehensive application of technologies such as data warehouse, OLAP and data mining.
Business intelligence can be considered as a process of gathering, managing and analyzing business information, and aims to enable decision makers at all levels of an enterprise to acquire knowledge or insight (insight) and to enable the decision makers to make decisions which are more beneficial to the enterprise. Business intelligence generally consists of data warehousing, online analytical processing, data mining, data backup and recovery, and the like. The business intelligence is realized by software, hardware, consultation service and application, and the basic architecture comprises three parts of data warehouse, online analysis processing and data mining.
Therefore, it is appropriate to consider business intelligence as a solution. The key of business intelligence is to extract useful data from many data from different enterprise operating systems and clean the data to ensure the correctness of the data, then merge the data into an enterprise-level data warehouse through Extraction (Extraction), Transformation (Transformation) and loading (Load), i.e. ETL process, so as to obtain a global view of the enterprise data, analyze and process the data on the basis by using a proper query and analysis tool, a data mining tool (big data magic mirror), an OLAP tool and the like (at this time, information becomes knowledge for assisting decision making), and finally present the knowledge to a manager to support the decision making process of the manager.
In the existing scheme, data in a source database is extracted into a Hive database, Spark performs multiple tasks simultaneously to read the source database data and writes the data into different partitions, multiple files of each partition are generated when the data falls into a hadoop file system, the files are exponentially increased when a user performs data increment extraction again, and when the system queries the data of the file system after the number of the files is increased, the disk reading is multiplied, and the data query performance is reduced.
Disclosure of Invention
The invention aims to: the small file merging method and system based on Spark solve the problems that in the existing scheme, data in a source database is extracted into a Hive base, Spark simultaneously reads the data of the source database by a plurality of tasks, and writes the data into different partitions, so that disk reading multiplication and data query performance are reduced.
The technical scheme adopted by the invention is as follows:
a small file merging method based on Spark is based on a source database, a business intelligent analysis platform with a Spark engine and a Hive database loaded with a hadoop file system, and further comprises the following steps:
s1, operating the source database and configuring a data extraction function by the user through the commercial intelligent analysis platform;
s2, reading data in a source database by the commercial intelligent analysis platform according to N extraction partitions configured by a user, and writing the extracted data into M partitions in the Hive database, wherein the number of files in each partition is N, and M, N is a positive integer;
and S3, merging the files in the M partitions by the hadoop file system according to the time period and the task rule pre-entered by the user.
In the existing scheme, data in a source database is extracted into a hive library, wherein Spark performs multiple tasks simultaneously and reads N source database data, and writes the data into M partitions, N files of each partition are generated when the data falls into a hadoop file system, and when a user performs data increment extraction again, M × N files are added again, and when the system queries the data of the file system after the number of the files is multiplied, the disk reading multiplication and the data query performance are reduced.
The invention is mainly realized by the following technical scheme:
firstly, a user operates a data source and configures a data extraction function in a data set through a data set processing node through a platform. Then the system extracts the data source data of the partitions according to N configured by a user and writes the data source data into M partitions in the Hive library, wherein the number of data files in each partition is N; and finally, the system combines the small file tasks at regular time, and combines the M files in the N partitions into 1 file according to the task rule, so that the scattering quantity of the small files is reduced, and the data query efficiency is improved in the processes of disk reading load, network transmission consumption, data combination and the like when data in the Hive library is queried.
Further, the business intelligence analysis platform includes a data set matching with the source database, and in step S1, the user operates the source database and configures the data extraction function through the data set processing node of the data set in the business intelligence analysis platform.
Further, the method for the business intelligence analysis platform to read the data in the source database according to the N extraction partitions configured by the user in step S2 includes: the Spark engine executes N tasks simultaneously to read the source database data and writes the data into M partitions.
Further, the method for merging the files in the M partitions by the hadoop file system according to the time period and the task rule pre-entered by the user in the step S3 includes the following steps:
s301, configuring the hadoop file system by a user, setting a period for the hadoop file system to perform file merging, and configuring a task rule for the file merging;
s302, starting timing after the hadoop file system is started, and merging the files in the M partitions by the hadoop file system according to the task rule configured in the step S301 after the timing reaches the time preset in the period in the step S301;
s303, after the hadoop file system completes the file combination, the timer is reset, and then the step S302 is carried out.
Further, the task rule in step S302 includes: sorting according to file names, merging, sorting according to file creation time, merging according to file modification time, sorting according to file sizes, and merging.
Further, the file merged in step S303 includes: the file header comprises names of all files before merging, and the file content comprises data of all files before merging.
A small file merging system based on Spark comprises a source database, a business intelligent analysis platform with a Spark engine and a Hive database based on a hadoop file system;
the Hive database comprises:
a memory for storing executable instructions and files;
and the processor is used for executing the executable instructions stored in the memory to realize the above small file merging method based on Spark.
In summary, due to the adoption of the technical scheme, the invention has the beneficial effects that:
1. the invention discloses a small file merging method and a small file merging system based on Spark, which solve the problems that in the prior art, data in a source database is extracted into a Hive database, Spark simultaneously reads the data of the source database by a plurality of tasks, and writes the data into different partitions, so that the reading of a disk is multiplied, and the data query performance is reduced;
2. according to the small file merging method and system based on Spark, the scattering distribution of disk files can be reduced, the load of reading I/O (input/output) of a disk during data query is reduced, the network transmission consumption is reduced, the memory consumption generated by merging of multiple file data during query is reduced, the query efficiency is improved, and the user perception is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts, wherein:
FIG. 1 is a schematic flow diagram of the present invention;
FIG. 2 is a diagram illustrating the number of documents before merging;
FIG. 3 is a diagram illustrating the number of merged small files according to the present invention;
FIG. 4 is a screenshot of the number of files before merging of small files in accordance with the present invention;
FIG. 5 is a screenshot of the number of files after merging of small files according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail with reference to fig. 1 to 5, the described embodiments should not be construed as limiting the present invention, and all other embodiments obtained by a person of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein is for the purpose of describing embodiments of the invention only and is not intended to be limiting of the invention.
Before further detailed description of the embodiments of the present invention, terms and expressions mentioned in the embodiments of the present invention are explained, and the terms and expressions mentioned in the embodiments of the present invention are applied to the following explanations.
Spark: apache Spark is a fast and general computing engine specially designed for large-scale data processing, Spark enables a memory distribution data set, and can optimize iterative workload besides providing interactive query;
hive: hive is a data warehouse tool based on Hadoop, and can map structured data files into a database table and provide SQL-like query functions;
a data source: the general name of data sources such as files and databases;
the data processing node: the system is a subdivision node of a business intelligent analysis platform for a data processing function, wherein the subdivision node comprises functions of table association, field coming, data filtering, field calculation, grouping statistics, data types and the like, and a series of processing functions of cleaning, filtering, splitting and the like of data are mainly performed on a data source;
data set: the data source is referred to collectively after the data processing node is configured, a data set may include one or more data sources and one or more data processing nodes, and the generated data set may be regarded as a virtual data source;
and (3) analysis: the module measures the dimensionality generated in the process after the data set is processed, performs configuration query and data association graph components, and can bind the dimensionality index generated by one data set to a plurality of different display components in the module;
reporting: the report module is used for assembling the analysis assembly, and providing for a user to check after layout configuration;
presto: the SQL query engine is an open-source distributed SQL query engine and is used for running interactive analysis queries aiming at data sources with various sizes;
a container component for submitting the Spark task, which is developed by a business intelligent analysis platform and dynamically submits the Spark task;
example 1
A small file merging method based on Spark is based on a source database, a business intelligent analysis platform with a Spark engine and a Hive database loaded with a hadoop file system, and further comprises the following steps:
s1, operating the source database and configuring a data extraction function by the user through the commercial intelligent analysis platform;
s2, reading data in a source database by the commercial intelligent analysis platform according to N extraction partitions configured by a user, and writing the extracted data into M partitions in the Hive database, wherein the number of files in each partition is N, and M, N is a positive integer;
and S3, merging the files in the M partitions by the hadoop file system according to the time period and the task rule pre-entered by the user.
Example 2
In this embodiment, based on embodiment 1, the business intelligence analysis platform includes a data set matching with the source database, and in step S1, the user operates the source database and configures the data extraction function through the data set processing node of the data set in the business intelligence analysis platform.
Further, the method for the business intelligence analysis platform to read the data in the source database according to the N extraction partitions configured by the user in step S2 includes: the Spark engine executes N tasks simultaneously to read the source database data and writes the data into M partitions.
Example 3
In this embodiment, based on embodiment 1, the method for merging files in M partitions by the hadoop file system according to the time period and the task rule pre-entered by the user in step S3 includes the following steps:
s301, configuring the hadoop file system by a user, setting a period for the hadoop file system to perform file merging, and configuring a task rule for the file merging;
s302, starting timing after the hadoop file system is started, and merging the files in the M partitions by the hadoop file system according to the task rule configured in the step S301 after the timing reaches the time preset in the period in the step S301;
s303, after the hadoop file system completes the file combination, the timer is reset, and then the step S302 is carried out.
Further, the task rule in step S302 includes: sorting according to file names, merging, sorting according to file creation time, merging according to file modification time, sorting according to file sizes, and merging.
Further, the file merged in step S303 includes: the file header comprises names of all files before merging, and the file content comprises data of all files before merging.
Example 4
A small file merging system based on Spark comprises a source database, a business intelligent analysis platform with a Spark engine and a Hive database based on a hadoop file system;
the Hive database comprises:
a memory for storing executable instructions and files;
and the processor is used for executing the executable instructions stored in the memory to realize the above small file merging method based on Spark.
Example 5
The embodiment is a part of function codes of the scheme:
ResultSet resultSet=stat.executeQuery("show partitI/Ons"+table);
if(resultSet==null)continue;
while(resultSet.next()){
stat.execute("alter table"+table+"partitI/On("+resultSet.getString(1)+")concatenate");
}
resultSet.close();
// update the merged partition field status of ec _ dataset _ info.
datasetManageMapper.updateMergeFileStatus(datasetId,0)。
Example 6
As shown in fig. 4, in this embodiment, a file list before merging is adopted, each partition is 8 files before merging, each file is about 410KB, the partition size is 256MB, when a system queries data of a file system, it is equivalent to reading 8 small files, a current hard disk memory needs to seek before reading the file, 8 small files need to seek for 8 times, the seek time is about 8ms, and the data transmission time is about 5ms for one file, which totally needs 104 ms.
Example 7
As shown in fig. 5, in this embodiment, a file list before merging is adopted, each partition is 1 large file before merging, a file is about 3.2MB, the partition size is 256MB, when a system queries data of a file system, it is equivalent to only reading 1 large file, a current hard disk memory needs to seek before reading the file, 1 small file only needs to seek for 1 time, the seek time is about 8ms, and the data transmission time is about 40ms for one large file, which totally needs 48 ms. The problems that in the existing scheme, data in a source database is extracted into a Hive database, Spark simultaneously reads the data in the source database by a plurality of tasks, and the data is written into different partitions, so that the reading of a disk is multiplied, and the data query performance is reduced are solved.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method can be implemented in other ways. The apparatus embodiments described above are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, the functional modules in the embodiments of the present invention may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention. It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (7)

1. A small file merging method based on Spark is based on a source database, a business intelligent analysis platform with a Spark engine and a Hive database loaded with a hadoop file system, and is characterized in that: further comprising the steps of:
s1, operating the source database and configuring a data extraction function by the user through the commercial intelligent analysis platform;
s2, reading data in a source database by the commercial intelligent analysis platform according to N extraction partitions configured by a user, and writing the extracted data into M partitions in the Hive database, wherein the number of files in each partition is N, and M, N is a positive integer;
and S3, merging the files in the M partitions by the hadoop file system according to the time period and the task rule pre-entered by the user.
2. The method for merging small files based on Spark according to claim 1, wherein: the business intelligence analysis platform includes a data set matching the source database, and in step S1, the user operates the source database and configures a data extraction function through the data set processing node of the data set in the business intelligence analysis platform.
3. The method for merging small files based on Spark according to claim 1, wherein: the method for the business intelligence analysis platform to read the data in the source database according to the N extraction partitions configured by the user in step S2 includes: the Spark engine executes N tasks simultaneously to read the source database data and writes the data into M partitions.
4. The method for merging small files based on Spark according to claim 1, wherein: the method for merging the files in the M partitions by the hadoop file system in the step S3 according to the time period and the task rule pre-entered by the user comprises the following steps:
s301, configuring the hadoop file system by a user, setting a period for the hadoop file system to perform file merging, and configuring a task rule for the file merging;
s302, starting timing after the hadoop file system is started, and merging the files in the M partitions by the hadoop file system according to the task rule configured in the step S301 after the timing reaches the time preset in the period in the step S301;
s303, after the hadoop file system completes the file combination, the timer is reset, and then the step S302 is carried out.
5. The method of claim 4, wherein the small file merging method based on Spark is as follows: the task rule in step S302 includes: sorting according to file names, merging, sorting according to file creation time, merging according to file modification time, sorting according to file sizes, and merging.
6. The method of claim 4, wherein the small file merging method based on Spark is as follows: the file merged in step S303 includes: the file header comprises names of all files before merging, and the file content comprises data of all files before merging.
7. A small file merging system based on Spark is characterized in that: the system comprises a source database, a business intelligent analysis platform with a Spark engine and a Hive database based on a hadoop file system;
the Hive database comprises:
a memory for storing executable instructions and files;
a processor, configured to execute the executable instructions stored in the memory, and implement the Spark-based doclet merging method as claimed in claim 1.
CN201911216907.0A 2019-12-03 2019-12-03 Spark-based small file merging method and system Pending CN111008235A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911216907.0A CN111008235A (en) 2019-12-03 2019-12-03 Spark-based small file merging method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911216907.0A CN111008235A (en) 2019-12-03 2019-12-03 Spark-based small file merging method and system

Publications (1)

Publication Number Publication Date
CN111008235A true CN111008235A (en) 2020-04-14

Family

ID=70112653

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911216907.0A Pending CN111008235A (en) 2019-12-03 2019-12-03 Spark-based small file merging method and system

Country Status (1)

Country Link
CN (1) CN111008235A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111897772A (en) * 2020-08-05 2020-11-06 光大兴陇信托有限责任公司 Big file data importing method
CN112799820A (en) * 2021-02-05 2021-05-14 拉卡拉支付股份有限公司 Data processing method, data processing apparatus, electronic device, storage medium, and program product
CN116069738A (en) * 2023-03-06 2023-05-05 鹏城实验室 Root zone file generation method, terminal equipment and computer readable storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101868029B1 (en) * 2017-03-10 2018-06-18 현대카드 주식회사 Method and system for sharing file based on blockchain
CN109726177A (en) * 2018-12-29 2019-05-07 北京赛思信安技术股份有限公司 A kind of mass file subregion indexing means based on HBase
CN109857803A (en) * 2018-12-13 2019-06-07 杭州数梦工场科技有限公司 Method of data synchronization, device, equipment, system and computer readable storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101868029B1 (en) * 2017-03-10 2018-06-18 현대카드 주식회사 Method and system for sharing file based on blockchain
CN109857803A (en) * 2018-12-13 2019-06-07 杭州数梦工场科技有限公司 Method of data synchronization, device, equipment, system and computer readable storage medium
CN109726177A (en) * 2018-12-29 2019-05-07 北京赛思信安技术股份有限公司 A kind of mass file subregion indexing means based on HBase

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111897772A (en) * 2020-08-05 2020-11-06 光大兴陇信托有限责任公司 Big file data importing method
CN111897772B (en) * 2020-08-05 2024-02-20 光大兴陇信托有限责任公司 Large file data importing method
CN112799820A (en) * 2021-02-05 2021-05-14 拉卡拉支付股份有限公司 Data processing method, data processing apparatus, electronic device, storage medium, and program product
CN116069738A (en) * 2023-03-06 2023-05-05 鹏城实验室 Root zone file generation method, terminal equipment and computer readable storage medium
CN116069738B (en) * 2023-03-06 2023-08-25 鹏城实验室 Root zone file generation method, terminal equipment and computer readable storage medium

Similar Documents

Publication Publication Date Title
Van der Aalst Extracting event data from databases to unleash process mining
Lightstone et al. Physical Database Design: the database professional's guide to exploiting indexes, views, storage, and more
EP3365810B1 (en) System and method for automatic inference of a cube schema from a tabular data for use in a multidimensional database environment
WO2019143705A1 (en) Dimension context propagation techniques for optimizing sql query plans
Hobbs et al. Oracle 10g data warehousing
Tvrdíková Support of decision making by business intelligence tools
CN111008235A (en) Spark-based small file merging method and system
US9633095B2 (en) Extract, transform and load (ETL) system and method
Woo et al. Market basket analysis algorithm with map/reduce of cloud computing
Agarwal et al. Approximate incremental big-data harmonization
Bear et al. The vertica database: Sql rdbms for managing big data
EP2526479A1 (en) Accessing large collection object tables in a database
US20140181151A1 (en) Query of multiple unjoined views
Dijkman et al. Enabling efficient process mining on large data sets: realizing an in-database process mining operator
Sogodekar et al. Big data analytics: hadoop and tools
US8341181B2 (en) Method for performance tuning a database
US20220237178A1 (en) Automatically updating column data type
US8250024B2 (en) Search relevance in business intelligence systems through networked ranking
Billot et al. Introduction to big data and its applications in insurance
McClean et al. A comparison of mapreduce and parallel database management systems
CN110737683A (en) Automatic partitioning method and device for extraction-based business intelligent analysis platforms
Aydin et al. Data modelling for large-scale social media analytics: design challenges and lessons learned
CN113032430B (en) Data processing method, device, medium and computing equipment
Kalyonova et al. Design Of specialized storage for heterogeneous project data
Khatiwada Architectural issues in real-time business intelligence

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200414

RJ01 Rejection of invention patent application after publication