CN107798111B

CN107798111B - Method for exporting data in large batch in distributed environment

Info

Publication number: CN107798111B
Application number: CN201711059530.3A
Authority: CN
Inventors: 李波; 岳永胜
Original assignee: Sichuan Changhong Electric Co Ltd
Current assignee: Sichuan Changhong Electric Co Ltd
Priority date: 2017-11-01
Filing date: 2017-11-01
Publication date: 2021-04-06
Anticipated expiration: 2037-11-01
Also published as: CN107798111A

Abstract

The invention discloses a method for exporting data in a large scale in a distributed environment, which improves the defects of the traditional exported data, and adopts the steps of firstly using a multithreading single table for inquiring, and then using multithreading to carry out table inquiry again on the inquired data, thereby achieving the effects of accelerating the data exporting speed and shortening the waiting time of a user. The invention adopts the technology of using multiple threads for many times, shortens the waiting time of the user and reduces the load of the database server and the application server.

Description

Method for exporting data in large batch in distributed environment

Technical Field

The invention relates to the technical field of data processing, in particular to a method for exporting data in a large batch in a distributed environment.

Background

With the continuous upgrade of software, the demand gradually increases, and the traditional design mode can not meet the business demand. Therefore, a distributed development mode is necessarily adopted to split the service. The database layer is designed by adopting a plurality of databases and a plurality of tables. The cloud platform operator or the enterprise manager needs to export and analyze the operation data at any time. Although software developers can develop functionality that meets the requirements based on the needs of the data analyst. But experience a series of tedious processes such as requirement analysis, summary design, code writing, testing, bug modification, release and the like from the requirement proposition to the online use. However, the market is often changeable in wind and cloud, and opportunities are vanished as time goes on, so that timely analysis of operation data becomes a necessary course for each website operator.

In daily work, operators are more used to export Excel tables to perform various visualization operations on data. Exporting large volumes of data is therefore a problem that developers need to face. The traditional architecture adopts a single database and single server mode. Direct queries can be exported to Excel. But in a distributed environment large amounts of data cannot be exported using conventional approaches. The problem mainly faced by the distributed environment is that the database is designed in a multi-library manner. The traditional mode can carry out joint query on the database, and after the database is separated, the joint query cannot be carried out.

It can be seen from the above that in the distributed environment, the conventional way of derivation is adopted, the user must be confronted with a long wait, and the system will also be confronted with the query problem of M × N (M is the number of data pieces to be queried, and N is the related table). For example, 3 tables are involved in deriving ten thousand pieces of data. The database will be queried 3 thousand times. In the case of a relatively high amount of concurrency, the database will necessarily crash. Therefore, the conventional way of exporting tables is not favorable for increasing the data export speed and reducing the database query time.

Disclosure of Invention

The invention aims to overcome the defects in the background technology and provide a method for exporting data in a large scale in a distributed environment.

In order to achieve the technical effects, the invention adopts the following technical scheme:

a method for exporting data in bulk in a distributed environment, comprising the steps of:

a, a web front end sends a condition of data needing to be exported, the number of data meeting the user requirement is inquired through aggregation, and the number of data inquires completed by each thread is calculated;

B. creating an interface thread for realizing java.util.current.callable, wherein the interface thread is used for paging to acquire data of an order table;

C. creating a thread pool according to the configuration condition of a server, adding a query task into the thread pool, presetting an optimal thread number for each thread pool, automatically calling the query task according to a scheduling strategy of the thread pool, performing data query of an order table, and storing data of the order table acquired by query into a first file;

D. performing combined query on data, establishing query tasks corresponding to the order tables one by one, performing memory paging on the data queried by the order tables, storing the queried data in a memory in a Map form, and acquiring a key value and a value, wherein the key value is an id value of the order data, and the value is a data value required to be acquired by the association table;

E. d, assembling the data in the first file and the data in the Map form acquired in the step D to form a List set meeting the export requirement, and storing the List set in a second file;

F. analyzing the data in the second file, acquiring the data in an IO stream mode, and writing the data into an Excel file;

G. presetting an Excel number upper limit value for the Excel file, and writing data in the form of Map stored in the memory into the hard disk when the data is written into the Excel file in the step F and the written Excel number reaches the Excel number upper limit value;

H. and writing file data into a response output stream of the http request at the front end of the web by using an http protocol, and finishing mass data export.

Further, when the data of the order table is obtained in step B, specifically, the method for packaging query data in the Service layer, calling jpa, using sql statement to perform query, and obtaining data queried in pages by transmitting the page parameters and query conditions to the Service layer in the controller layer.

Further, the following calculation method is adopted when the optimal thread number is preset for the thread pool in the step C: the optimal thread count is (thread latency/thread CPU time +1) CPU count.

Further, the scheduling policy of the thread pool in the step C is: and firstly, running the threads added into the thread pool, and queuing redundant threads in the waiting queue to be run when the number of the threads exceeds the preset number of the threads.

Further, the derivation requirement in step E is: the data required to be exported at least includes order information, creator information and company information.

Further, when data is written into the Excel file in the step G, SXSSFWorkbook of POI is mainly used.

Compared with the prior art, the invention has the following beneficial effects:

in the method for exporting data in a large batch in a distributed environment, the processing logic is to use multithreading for multiple times, so that the waiting time of io is shortened to the shortest; the method and the device have the advantages that multithreading single-table query is used, and then the queried data is subjected to table query again by multithreading, so that the data export speed can be effectively increased, the user waiting time can be shortened, and meanwhile, the load of a database server and the load of an application server can be reduced.

Drawings

FIG. 1 is a process flow diagram of one embodiment of a method of exporting data in large volumes in a distributed environment of the present invention.

Detailed Description

The invention will be further elucidated and described with reference to the embodiments of the invention described hereinafter.

Example (b):

as shown in fig. 1, a method for exporting data in a large batch in a distributed environment specifically includes the following steps:

s101: the condition that the web front end sends data needing to be exported;

s102: the web front end sends a data acquisition request with a derivation condition to a controller method corresponding to the background interface, and then queries the total number of data to be derived according to the derivation condition;

s103: and calculating the number of threads to be divided for single table export according to the total number of inquires, wherein during calculation, the number of threads is the total number/the total number exported by each thread, the total number which can be completed by each thread can be inquired by how many threads can be inquired through the maximum load of program test, and a plurality of established threads, namely inquiry tasks, are put into a thread pool for data inquiry of an order table.

S104: and storing the large data volume of the order table obtained by inquiry into a file system, and calculating the number of threads required to finish data assembly again.

For example, in the requirement of order export, it is desirable to see order information, creator information and company information in Excel, where the three information are stored in three tables of three databases respectively, first, after the order data is queried, the order data is stored in the file system a, then, the order ID is taken out and stored in the List, because two tables need to be queried, two thread pools need to be created, the creator information and the company information are obtained by using an order obtaining mode, and the obtained creator information and the obtained company information are stored by using two maps respectively.

And one thread completes the assembly of one batch of data, and starts the threads corresponding to the data table associated with the data while starting the threads together. And each associated thread queries batch query data by using In, and assembles the query data into a format expected by Excel, for example, the Excel has order information, creator information and company information, and then assembles the existing three parts of data into a row.

S105: reading data in a file in a streaming mode, writing the read data into an Excel table, setting a storage strip number threshold value in the Excel table, automatically writing the data into a hard disk when the storage strip number threshold value is reached, circularly exporting batch data into the Excel, writing file data into a response output stream of an http request at the front end of a web by using an http protocol, and exporting the batch data.

It will be understood that the above embodiments are merely exemplary embodiments taken to illustrate the principles of the present invention, which is not limited thereto. It will be apparent to those skilled in the art that various modifications and improvements can be made without departing from the spirit and substance of the invention, and these modifications and improvements are also considered to be within the scope of the invention.

Claims

1. A method for exporting data in bulk in a distributed environment, comprising the steps of:

B. creating an interface thread for realizing java.util.current.callable, wherein the interface thread is used for paging to acquire data of an order table; when the data of the order table is obtained in the step B, specifically, a method for packaging query data in a Service layer, calling jpa, using an sql statement to perform query, and obtaining paged query data by transmitting paging parameters and query conditions to the Service layer in a controller layer;

C. creating a thread pool according to the configuration condition of a server, adding threads, namely query tasks, into the thread pool, presetting an optimal thread number for each thread pool, automatically calling the query tasks according to a scheduling strategy of the thread pool, performing data query on an order table, and storing data of the order table acquired by query into a first file;

G. presetting an Excel number upper limit value for the Excel file, and writing data in the form of Map stored in the memory into the hard disk when the data is written into the Excel file in the step F and the written Excel number reaches the Excel number upper limit value; when the data are written into the Excel file in the step G, SXSSSFWorkbook of POI is mainly used;

2. The method of claim 1, wherein the following calculation is adopted when the optimal number of threads is preset for the thread pool in step C: the optimal thread count is (thread latency/thread CPU time +1) CPU count.

3. The method of claim 1, wherein the scheduling policy of the thread pool in step C is: and firstly, running the threads added into the thread pool, and queuing redundant threads in the waiting queue to be run when the number of the threads exceeds the preset number of the threads.

4. The method for exporting data in bulk in a distributed environment according to claim 1, wherein the exporting requirement in step E is: the data required to be exported at least includes order information, creator information and company information.