CN107798111B - Method for exporting data in large batch in distributed environment - Google Patents

Method for exporting data in large batch in distributed environment Download PDF

Info

Publication number
CN107798111B
CN107798111B CN201711059530.3A CN201711059530A CN107798111B CN 107798111 B CN107798111 B CN 107798111B CN 201711059530 A CN201711059530 A CN 201711059530A CN 107798111 B CN107798111 B CN 107798111B
Authority
CN
China
Prior art keywords
data
thread
query
file
excel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711059530.3A
Other languages
Chinese (zh)
Other versions
CN107798111A (en
Inventor
李波
岳永胜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan Changhong Electric Co Ltd
Original Assignee
Sichuan Changhong Electric Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan Changhong Electric Co Ltd filed Critical Sichuan Changhong Electric Co Ltd
Priority to CN201711059530.3A priority Critical patent/CN107798111B/en
Publication of CN107798111A publication Critical patent/CN107798111A/en
Application granted granted Critical
Publication of CN107798111B publication Critical patent/CN107798111B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2471Distributed queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/258Data format conversion from or to a database
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5011Pool
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5018Thread allocation

Abstract

The invention discloses a method for exporting data in a large scale in a distributed environment, which improves the defects of the traditional exported data, and adopts the steps of firstly using a multithreading single table for inquiring, and then using multithreading to carry out table inquiry again on the inquired data, thereby achieving the effects of accelerating the data exporting speed and shortening the waiting time of a user. The invention adopts the technology of using multiple threads for many times, shortens the waiting time of the user and reduces the load of the database server and the application server.

Description

Method for exporting data in large batch in distributed environment
Technical Field
The invention relates to the technical field of data processing, in particular to a method for exporting data in a large batch in a distributed environment.
Background
With the continuous upgrade of software, the demand gradually increases, and the traditional design mode can not meet the business demand. Therefore, a distributed development mode is necessarily adopted to split the service. The database layer is designed by adopting a plurality of databases and a plurality of tables. The cloud platform operator or the enterprise manager needs to export and analyze the operation data at any time. Although software developers can develop functionality that meets the requirements based on the needs of the data analyst. But experience a series of tedious processes such as requirement analysis, summary design, code writing, testing, bug modification, release and the like from the requirement proposition to the online use. However, the market is often changeable in wind and cloud, and opportunities are vanished as time goes on, so that timely analysis of operation data becomes a necessary course for each website operator.
In daily work, operators are more used to export Excel tables to perform various visualization operations on data. Exporting large volumes of data is therefore a problem that developers need to face. The traditional architecture adopts a single database and single server mode. Direct queries can be exported to Excel. But in a distributed environment large amounts of data cannot be exported using conventional approaches. The problem mainly faced by the distributed environment is that the database is designed in a multi-library manner. The traditional mode can carry out joint query on the database, and after the database is separated, the joint query cannot be carried out.
It can be seen from the above that in the distributed environment, the conventional way of derivation is adopted, the user must be confronted with a long wait, and the system will also be confronted with the query problem of M × N (M is the number of data pieces to be queried, and N is the related table). For example, 3 tables are involved in deriving ten thousand pieces of data. The database will be queried 3 thousand times. In the case of a relatively high amount of concurrency, the database will necessarily crash. Therefore, the conventional way of exporting tables is not favorable for increasing the data export speed and reducing the database query time.
Disclosure of Invention
The invention aims to overcome the defects in the background technology and provide a method for exporting data in a large scale in a distributed environment.
In order to achieve the technical effects, the invention adopts the following technical scheme:
a method for exporting data in bulk in a distributed environment, comprising the steps of:
a, a web front end sends a condition of data needing to be exported, the number of data meeting the user requirement is inquired through aggregation, and the number of data inquires completed by each thread is calculated;
B. creating an interface thread for realizing java.util.current.callable, wherein the interface thread is used for paging to acquire data of an order table;
C. creating a thread pool according to the configuration condition of a server, adding a query task into the thread pool, presetting an optimal thread number for each thread pool, automatically calling the query task according to a scheduling strategy of the thread pool, performing data query of an order table, and storing data of the order table acquired by query into a first file;
D. performing combined query on data, establishing query tasks corresponding to the order tables one by one, performing memory paging on the data queried by the order tables, storing the queried data in a memory in a Map form, and acquiring a key value and a value, wherein the key value is an id value of the order data, and the value is a data value required to be acquired by the association table;
E. d, assembling the data in the first file and the data in the Map form acquired in the step D to form a List set meeting the export requirement, and storing the List set in a second file;
F. analyzing the data in the second file, acquiring the data in an IO stream mode, and writing the data into an Excel file;
G. presetting an Excel number upper limit value for the Excel file, and writing data in the form of Map stored in the memory into the hard disk when the data is written into the Excel file in the step F and the written Excel number reaches the Excel number upper limit value;
H. and writing file data into a response output stream of the http request at the front end of the web by using an http protocol, and finishing mass data export.
Further, when the data of the order table is obtained in step B, specifically, the method for packaging query data in the Service layer, calling jpa, using sql statement to perform query, and obtaining data queried in pages by transmitting the page parameters and query conditions to the Service layer in the controller layer.
Further, the following calculation method is adopted when the optimal thread number is preset for the thread pool in the step C: the optimal thread count is (thread latency/thread CPU time +1) CPU count.
Further, the scheduling policy of the thread pool in the step C is: and firstly, running the threads added into the thread pool, and queuing redundant threads in the waiting queue to be run when the number of the threads exceeds the preset number of the threads.
Further, the derivation requirement in step E is: the data required to be exported at least includes order information, creator information and company information.
Further, when data is written into the Excel file in the step G, SXSSFWorkbook of POI is mainly used.
Compared with the prior art, the invention has the following beneficial effects:
in the method for exporting data in a large batch in a distributed environment, the processing logic is to use multithreading for multiple times, so that the waiting time of io is shortened to the shortest; the method and the device have the advantages that multithreading single-table query is used, and then the queried data is subjected to table query again by multithreading, so that the data export speed can be effectively increased, the user waiting time can be shortened, and meanwhile, the load of a database server and the load of an application server can be reduced.
Drawings
FIG. 1 is a process flow diagram of one embodiment of a method of exporting data in large volumes in a distributed environment of the present invention.
Detailed Description
The invention will be further elucidated and described with reference to the embodiments of the invention described hereinafter.
Example (b):
as shown in fig. 1, a method for exporting data in a large batch in a distributed environment specifically includes the following steps:
s101: the condition that the web front end sends data needing to be exported;
s102: the web front end sends a data acquisition request with a derivation condition to a controller method corresponding to the background interface, and then queries the total number of data to be derived according to the derivation condition;
s103: and calculating the number of threads to be divided for single table export according to the total number of inquires, wherein during calculation, the number of threads is the total number/the total number exported by each thread, the total number which can be completed by each thread can be inquired by how many threads can be inquired through the maximum load of program test, and a plurality of established threads, namely inquiry tasks, are put into a thread pool for data inquiry of an order table.
S104: and storing the large data volume of the order table obtained by inquiry into a file system, and calculating the number of threads required to finish data assembly again.
For example, in the requirement of order export, it is desirable to see order information, creator information and company information in Excel, where the three information are stored in three tables of three databases respectively, first, after the order data is queried, the order data is stored in the file system a, then, the order ID is taken out and stored in the List, because two tables need to be queried, two thread pools need to be created, the creator information and the company information are obtained by using an order obtaining mode, and the obtained creator information and the obtained company information are stored by using two maps respectively.
And one thread completes the assembly of one batch of data, and starts the threads corresponding to the data table associated with the data while starting the threads together. And each associated thread queries batch query data by using In, and assembles the query data into a format expected by Excel, for example, the Excel has order information, creator information and company information, and then assembles the existing three parts of data into a row.
S105: reading data in a file in a streaming mode, writing the read data into an Excel table, setting a storage strip number threshold value in the Excel table, automatically writing the data into a hard disk when the storage strip number threshold value is reached, circularly exporting batch data into the Excel, writing file data into a response output stream of an http request at the front end of a web by using an http protocol, and exporting the batch data.
It will be understood that the above embodiments are merely exemplary embodiments taken to illustrate the principles of the present invention, which is not limited thereto. It will be apparent to those skilled in the art that various modifications and improvements can be made without departing from the spirit and substance of the invention, and these modifications and improvements are also considered to be within the scope of the invention.

Claims (4)

1. A method for exporting data in bulk in a distributed environment, comprising the steps of:
a, a web front end sends a condition of data needing to be exported, the number of data meeting the user requirement is inquired through aggregation, and the number of data inquires completed by each thread is calculated;
B. creating an interface thread for realizing java.util.current.callable, wherein the interface thread is used for paging to acquire data of an order table; when the data of the order table is obtained in the step B, specifically, a method for packaging query data in a Service layer, calling jpa, using an sql statement to perform query, and obtaining paged query data by transmitting paging parameters and query conditions to the Service layer in a controller layer;
C. creating a thread pool according to the configuration condition of a server, adding threads, namely query tasks, into the thread pool, presetting an optimal thread number for each thread pool, automatically calling the query tasks according to a scheduling strategy of the thread pool, performing data query on an order table, and storing data of the order table acquired by query into a first file;
D. performing combined query on data, establishing query tasks corresponding to the order tables one by one, performing memory paging on the data queried by the order tables, storing the queried data in a memory in a Map form, and acquiring a key value and a value, wherein the key value is an id value of the order data, and the value is a data value required to be acquired by the association table;
E. d, assembling the data in the first file and the data in the Map form acquired in the step D to form a List set meeting the export requirement, and storing the List set in a second file;
F. analyzing the data in the second file, acquiring the data in an IO stream mode, and writing the data into an Excel file;
G. presetting an Excel number upper limit value for the Excel file, and writing data in the form of Map stored in the memory into the hard disk when the data is written into the Excel file in the step F and the written Excel number reaches the Excel number upper limit value; when the data are written into the Excel file in the step G, SXSSSFWorkbook of POI is mainly used;
H. and writing file data into a response output stream of the http request at the front end of the web by using an http protocol, and finishing mass data export.
2. The method of claim 1, wherein the following calculation is adopted when the optimal number of threads is preset for the thread pool in step C: the optimal thread count is (thread latency/thread CPU time +1) CPU count.
3. The method of claim 1, wherein the scheduling policy of the thread pool in step C is: and firstly, running the threads added into the thread pool, and queuing redundant threads in the waiting queue to be run when the number of the threads exceeds the preset number of the threads.
4. The method for exporting data in bulk in a distributed environment according to claim 1, wherein the exporting requirement in step E is: the data required to be exported at least includes order information, creator information and company information.
CN201711059530.3A 2017-11-01 2017-11-01 Method for exporting data in large batch in distributed environment Active CN107798111B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711059530.3A CN107798111B (en) 2017-11-01 2017-11-01 Method for exporting data in large batch in distributed environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711059530.3A CN107798111B (en) 2017-11-01 2017-11-01 Method for exporting data in large batch in distributed environment

Publications (2)

Publication Number Publication Date
CN107798111A CN107798111A (en) 2018-03-13
CN107798111B true CN107798111B (en) 2021-04-06

Family

ID=61548636

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711059530.3A Active CN107798111B (en) 2017-11-01 2017-11-01 Method for exporting data in large batch in distributed environment

Country Status (1)

Country Link
CN (1) CN107798111B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109299157B (en) * 2018-08-27 2021-11-23 杭州安恒信息技术股份有限公司 Data export method and device for distributed big single table
CN110532311B (en) * 2019-08-14 2023-11-28 泰安协同软件有限公司 Distributed data export method and system based on queues
CN111914151A (en) * 2020-08-11 2020-11-10 上海毅博电子商务有限责任公司 Association table object query optimization method
CN113177826A (en) * 2021-05-20 2021-07-27 青岛海信智慧生活科技股份有限公司 Method and device for configuring commodities and cells in batch

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103092993A (en) * 2013-02-18 2013-05-08 五八同城信息技术有限公司 Data exporting method and data exporting device
CN103793519A (en) * 2014-02-14 2014-05-14 浪潮通信信息系统有限公司 Automatic tool supporting exportation of mass data
US8782101B1 (en) * 2012-01-20 2014-07-15 Google Inc. Transferring data across different database platforms
EP2777009A4 (en) * 2011-11-10 2015-06-17 Microsoft Technology Licensing Llc Export of content items from multiple, disparate content sources
CN105740293A (en) * 2014-12-12 2016-07-06 金蝶软件(中国)有限公司 Data export method and device
CN106407231A (en) * 2015-08-03 2017-02-15 天脉聚源(北京)科技有限公司 A data multi-thread export method and system

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6542920B1 (en) * 1999-09-24 2003-04-01 Sun Microsystems, Inc. Mechanism for implementing multiple thread pools in a computer system to optimize system performance
CN101996067B (en) * 2009-08-19 2014-07-16 阿里巴巴集团控股有限公司 Data export method and device
CN102360310B (en) * 2011-09-28 2014-03-26 中国电子科技集团公司第二十八研究所 Multitask process monitoring method in distributed system environment
CN103034735B (en) * 2012-12-26 2017-02-08 北京讯鸟软件有限公司 Big data distributed file export method
CN103995807B (en) * 2013-02-16 2018-01-16 长沙中兴软创软件有限公司 Magnanimity data query and the method for after-treatment under a kind of framework based on Web
CN103412961B (en) * 2013-09-04 2017-03-29 广东全通教育股份有限公司 A kind of real-time processing method and system for deriving Magnanimous Data Report form
KR102103543B1 (en) * 2013-11-28 2020-05-29 삼성전자 주식회사 All-in-one data storage device having internal hardware filter, method thereof, and system having the data storage device
CN106095775A (en) * 2016-05-24 2016-11-09 中国银行股份有限公司 A kind of method and system realizing data query or derivation
CN106776829A (en) * 2016-11-28 2017-05-31 成都广达新网科技股份有限公司 A kind of data guiding system and its method of work

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2777009A4 (en) * 2011-11-10 2015-06-17 Microsoft Technology Licensing Llc Export of content items from multiple, disparate content sources
US8782101B1 (en) * 2012-01-20 2014-07-15 Google Inc. Transferring data across different database platforms
CN103092993A (en) * 2013-02-18 2013-05-08 五八同城信息技术有限公司 Data exporting method and data exporting device
CN103793519A (en) * 2014-02-14 2014-05-14 浪潮通信信息系统有限公司 Automatic tool supporting exportation of mass data
CN105740293A (en) * 2014-12-12 2016-07-06 金蝶软件(中国)有限公司 Data export method and device
CN106407231A (en) * 2015-08-03 2017-02-15 天脉聚源(北京)科技有限公司 A data multi-thread export method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"Java Excel SXSSFWorkbook大量数据导出";ZX星辰;《https://blog.csdn.net/zxxingchen/article/details/70159473》;20170413;第1页 *

Also Published As

Publication number Publication date
CN107798111A (en) 2018-03-13

Similar Documents

Publication Publication Date Title
CN107798111B (en) Method for exporting data in large batch in distributed environment
CN107391653B (en) Distributed NewSQL database system and picture data storage method
US9135647B2 (en) Methods and systems for flexible and scalable databases
CN105824957A (en) Query engine system and query method of distributive memory column-oriented database
US11132383B2 (en) Techniques for processing database tables using indexes
CN110806933B (en) Batch task processing method, device, equipment and storage medium
CN107301214B (en) Data migration method and device in HIVE and terminal equipment
US9930113B2 (en) Data retrieval via a telecommunication network
CN111400288A (en) Data quality inspection method and system
CN108459913B (en) Data parallel processing method and device and server
WO2021012861A1 (en) Method and apparatus for evaluating data query time consumption, and computer device and storage medium
CN102750384A (en) Device and method for acquiring data from multidatabase engine
CN112052082A (en) Task attribute optimization method, device, server and storage medium
CN112000703B (en) Data warehousing processing method and device, computer equipment and storage medium
CN103186384A (en) Business-component-oriented software designing and analyzing system and using method thereof
CN112035230A (en) Method and device for generating task scheduling file and storage medium
CN113407343A (en) Service processing method, device and equipment based on resource allocation
CN115984022B (en) Unified account checking method and device for distributed payment system
CN110222046B (en) List data processing method, device, server and storage medium
AU2019241002B2 (en) Transaction processing method and system, and server
CN111190704A (en) Task classification processing method based on big data processing framework
CN106502842A (en) Data reconstruction method and system
CN107562943B (en) Data calculation method and system
CN105630997A (en) Data parallel processing method, device and equipment
CN115617480A (en) Task scheduling method, device and system and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant