CN107798111B - Method for exporting data in large batch in distributed environment - Google Patents
Method for exporting data in large batch in distributed environment Download PDFInfo
- Publication number
- CN107798111B CN107798111B CN201711059530.3A CN201711059530A CN107798111B CN 107798111 B CN107798111 B CN 107798111B CN 201711059530 A CN201711059530 A CN 201711059530A CN 107798111 B CN107798111 B CN 107798111B
- Authority
- CN
- China
- Prior art keywords
- data
- thread
- query
- file
- excel
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2471—Distributed queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/258—Data format conversion from or to a database
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5083—Techniques for rebalancing the load in a distributed system
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2209/00—Indexing scheme relating to G06F9/00
- G06F2209/50—Indexing scheme relating to G06F9/50
- G06F2209/5011—Pool
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2209/00—Indexing scheme relating to G06F9/00
- G06F2209/50—Indexing scheme relating to G06F9/50
- G06F2209/5018—Thread allocation
Abstract
The invention discloses a method for exporting data in a large scale in a distributed environment, which improves the defects of the traditional exported data, and adopts the steps of firstly using a multithreading single table for inquiring, and then using multithreading to carry out table inquiry again on the inquired data, thereby achieving the effects of accelerating the data exporting speed and shortening the waiting time of a user. The invention adopts the technology of using multiple threads for many times, shortens the waiting time of the user and reduces the load of the database server and the application server.
Description
Technical Field
The invention relates to the technical field of data processing, in particular to a method for exporting data in a large batch in a distributed environment.
Background
With the continuous upgrade of software, the demand gradually increases, and the traditional design mode can not meet the business demand. Therefore, a distributed development mode is necessarily adopted to split the service. The database layer is designed by adopting a plurality of databases and a plurality of tables. The cloud platform operator or the enterprise manager needs to export and analyze the operation data at any time. Although software developers can develop functionality that meets the requirements based on the needs of the data analyst. But experience a series of tedious processes such as requirement analysis, summary design, code writing, testing, bug modification, release and the like from the requirement proposition to the online use. However, the market is often changeable in wind and cloud, and opportunities are vanished as time goes on, so that timely analysis of operation data becomes a necessary course for each website operator.
In daily work, operators are more used to export Excel tables to perform various visualization operations on data. Exporting large volumes of data is therefore a problem that developers need to face. The traditional architecture adopts a single database and single server mode. Direct queries can be exported to Excel. But in a distributed environment large amounts of data cannot be exported using conventional approaches. The problem mainly faced by the distributed environment is that the database is designed in a multi-library manner. The traditional mode can carry out joint query on the database, and after the database is separated, the joint query cannot be carried out.
It can be seen from the above that in the distributed environment, the conventional way of derivation is adopted, the user must be confronted with a long wait, and the system will also be confronted with the query problem of M × N (M is the number of data pieces to be queried, and N is the related table). For example, 3 tables are involved in deriving ten thousand pieces of data. The database will be queried 3 thousand times. In the case of a relatively high amount of concurrency, the database will necessarily crash. Therefore, the conventional way of exporting tables is not favorable for increasing the data export speed and reducing the database query time.
Disclosure of Invention
The invention aims to overcome the defects in the background technology and provide a method for exporting data in a large scale in a distributed environment.
In order to achieve the technical effects, the invention adopts the following technical scheme:
a method for exporting data in bulk in a distributed environment, comprising the steps of:
a, a web front end sends a condition of data needing to be exported, the number of data meeting the user requirement is inquired through aggregation, and the number of data inquires completed by each thread is calculated;
B. creating an interface thread for realizing java.util.current.callable, wherein the interface thread is used for paging to acquire data of an order table;
C. creating a thread pool according to the configuration condition of a server, adding a query task into the thread pool, presetting an optimal thread number for each thread pool, automatically calling the query task according to a scheduling strategy of the thread pool, performing data query of an order table, and storing data of the order table acquired by query into a first file;
D. performing combined query on data, establishing query tasks corresponding to the order tables one by one, performing memory paging on the data queried by the order tables, storing the queried data in a memory in a Map form, and acquiring a key value and a value, wherein the key value is an id value of the order data, and the value is a data value required to be acquired by the association table;
E. d, assembling the data in the first file and the data in the Map form acquired in the step D to form a List set meeting the export requirement, and storing the List set in a second file;
F. analyzing the data in the second file, acquiring the data in an IO stream mode, and writing the data into an Excel file;
G. presetting an Excel number upper limit value for the Excel file, and writing data in the form of Map stored in the memory into the hard disk when the data is written into the Excel file in the step F and the written Excel number reaches the Excel number upper limit value;
H. and writing file data into a response output stream of the http request at the front end of the web by using an http protocol, and finishing mass data export.
Further, when the data of the order table is obtained in step B, specifically, the method for packaging query data in the Service layer, calling jpa, using sql statement to perform query, and obtaining data queried in pages by transmitting the page parameters and query conditions to the Service layer in the controller layer.
Further, the following calculation method is adopted when the optimal thread number is preset for the thread pool in the step C: the optimal thread count is (thread latency/thread CPU time +1) CPU count.
Further, the scheduling policy of the thread pool in the step C is: and firstly, running the threads added into the thread pool, and queuing redundant threads in the waiting queue to be run when the number of the threads exceeds the preset number of the threads.
Further, the derivation requirement in step E is: the data required to be exported at least includes order information, creator information and company information.
Further, when data is written into the Excel file in the step G, SXSSFWorkbook of POI is mainly used.
Compared with the prior art, the invention has the following beneficial effects:
in the method for exporting data in a large batch in a distributed environment, the processing logic is to use multithreading for multiple times, so that the waiting time of io is shortened to the shortest; the method and the device have the advantages that multithreading single-table query is used, and then the queried data is subjected to table query again by multithreading, so that the data export speed can be effectively increased, the user waiting time can be shortened, and meanwhile, the load of a database server and the load of an application server can be reduced.
Drawings
FIG. 1 is a process flow diagram of one embodiment of a method of exporting data in large volumes in a distributed environment of the present invention.
Detailed Description
The invention will be further elucidated and described with reference to the embodiments of the invention described hereinafter.
Example (b):
as shown in fig. 1, a method for exporting data in a large batch in a distributed environment specifically includes the following steps:
s101: the condition that the web front end sends data needing to be exported;
s102: the web front end sends a data acquisition request with a derivation condition to a controller method corresponding to the background interface, and then queries the total number of data to be derived according to the derivation condition;
s103: and calculating the number of threads to be divided for single table export according to the total number of inquires, wherein during calculation, the number of threads is the total number/the total number exported by each thread, the total number which can be completed by each thread can be inquired by how many threads can be inquired through the maximum load of program test, and a plurality of established threads, namely inquiry tasks, are put into a thread pool for data inquiry of an order table.
S104: and storing the large data volume of the order table obtained by inquiry into a file system, and calculating the number of threads required to finish data assembly again.
For example, in the requirement of order export, it is desirable to see order information, creator information and company information in Excel, where the three information are stored in three tables of three databases respectively, first, after the order data is queried, the order data is stored in the file system a, then, the order ID is taken out and stored in the List, because two tables need to be queried, two thread pools need to be created, the creator information and the company information are obtained by using an order obtaining mode, and the obtained creator information and the obtained company information are stored by using two maps respectively.
And one thread completes the assembly of one batch of data, and starts the threads corresponding to the data table associated with the data while starting the threads together. And each associated thread queries batch query data by using In, and assembles the query data into a format expected by Excel, for example, the Excel has order information, creator information and company information, and then assembles the existing three parts of data into a row.
S105: reading data in a file in a streaming mode, writing the read data into an Excel table, setting a storage strip number threshold value in the Excel table, automatically writing the data into a hard disk when the storage strip number threshold value is reached, circularly exporting batch data into the Excel, writing file data into a response output stream of an http request at the front end of a web by using an http protocol, and exporting the batch data.
It will be understood that the above embodiments are merely exemplary embodiments taken to illustrate the principles of the present invention, which is not limited thereto. It will be apparent to those skilled in the art that various modifications and improvements can be made without departing from the spirit and substance of the invention, and these modifications and improvements are also considered to be within the scope of the invention.
Claims (4)
1. A method for exporting data in bulk in a distributed environment, comprising the steps of:
a, a web front end sends a condition of data needing to be exported, the number of data meeting the user requirement is inquired through aggregation, and the number of data inquires completed by each thread is calculated;
B. creating an interface thread for realizing java.util.current.callable, wherein the interface thread is used for paging to acquire data of an order table; when the data of the order table is obtained in the step B, specifically, a method for packaging query data in a Service layer, calling jpa, using an sql statement to perform query, and obtaining paged query data by transmitting paging parameters and query conditions to the Service layer in a controller layer;
C. creating a thread pool according to the configuration condition of a server, adding threads, namely query tasks, into the thread pool, presetting an optimal thread number for each thread pool, automatically calling the query tasks according to a scheduling strategy of the thread pool, performing data query on an order table, and storing data of the order table acquired by query into a first file;
D. performing combined query on data, establishing query tasks corresponding to the order tables one by one, performing memory paging on the data queried by the order tables, storing the queried data in a memory in a Map form, and acquiring a key value and a value, wherein the key value is an id value of the order data, and the value is a data value required to be acquired by the association table;
E. d, assembling the data in the first file and the data in the Map form acquired in the step D to form a List set meeting the export requirement, and storing the List set in a second file;
F. analyzing the data in the second file, acquiring the data in an IO stream mode, and writing the data into an Excel file;
G. presetting an Excel number upper limit value for the Excel file, and writing data in the form of Map stored in the memory into the hard disk when the data is written into the Excel file in the step F and the written Excel number reaches the Excel number upper limit value; when the data are written into the Excel file in the step G, SXSSSFWorkbook of POI is mainly used;
H. and writing file data into a response output stream of the http request at the front end of the web by using an http protocol, and finishing mass data export.
2. The method of claim 1, wherein the following calculation is adopted when the optimal number of threads is preset for the thread pool in step C: the optimal thread count is (thread latency/thread CPU time +1) CPU count.
3. The method of claim 1, wherein the scheduling policy of the thread pool in step C is: and firstly, running the threads added into the thread pool, and queuing redundant threads in the waiting queue to be run when the number of the threads exceeds the preset number of the threads.
4. The method for exporting data in bulk in a distributed environment according to claim 1, wherein the exporting requirement in step E is: the data required to be exported at least includes order information, creator information and company information.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711059530.3A CN107798111B (en) | 2017-11-01 | 2017-11-01 | Method for exporting data in large batch in distributed environment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711059530.3A CN107798111B (en) | 2017-11-01 | 2017-11-01 | Method for exporting data in large batch in distributed environment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107798111A CN107798111A (en) | 2018-03-13 |
CN107798111B true CN107798111B (en) | 2021-04-06 |
Family
ID=61548636
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711059530.3A Active CN107798111B (en) | 2017-11-01 | 2017-11-01 | Method for exporting data in large batch in distributed environment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107798111B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109299157B (en) * | 2018-08-27 | 2021-11-23 | 杭州安恒信息技术股份有限公司 | Data export method and device for distributed big single table |
CN110532311B (en) * | 2019-08-14 | 2023-11-28 | 泰安协同软件有限公司 | Distributed data export method and system based on queues |
CN111914151A (en) * | 2020-08-11 | 2020-11-10 | 上海毅博电子商务有限责任公司 | Association table object query optimization method |
CN113177826A (en) * | 2021-05-20 | 2021-07-27 | 青岛海信智慧生活科技股份有限公司 | Method and device for configuring commodities and cells in batch |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103092993A (en) * | 2013-02-18 | 2013-05-08 | 五八同城信息技术有限公司 | Data exporting method and data exporting device |
CN103793519A (en) * | 2014-02-14 | 2014-05-14 | 浪潮通信信息系统有限公司 | Automatic tool supporting exportation of mass data |
US8782101B1 (en) * | 2012-01-20 | 2014-07-15 | Google Inc. | Transferring data across different database platforms |
EP2777009A4 (en) * | 2011-11-10 | 2015-06-17 | Microsoft Technology Licensing Llc | Export of content items from multiple, disparate content sources |
CN105740293A (en) * | 2014-12-12 | 2016-07-06 | 金蝶软件(中国)有限公司 | Data export method and device |
CN106407231A (en) * | 2015-08-03 | 2017-02-15 | 天脉聚源(北京)科技有限公司 | A data multi-thread export method and system |
Family Cites Families (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6542920B1 (en) * | 1999-09-24 | 2003-04-01 | Sun Microsystems, Inc. | Mechanism for implementing multiple thread pools in a computer system to optimize system performance |
CN101996067B (en) * | 2009-08-19 | 2014-07-16 | 阿里巴巴集团控股有限公司 | Data export method and device |
CN102360310B (en) * | 2011-09-28 | 2014-03-26 | 中国电子科技集团公司第二十八研究所 | Multitask process monitoring method in distributed system environment |
CN103034735B (en) * | 2012-12-26 | 2017-02-08 | 北京讯鸟软件有限公司 | Big data distributed file export method |
CN103995807B (en) * | 2013-02-16 | 2018-01-16 | 长沙中兴软创软件有限公司 | Magnanimity data query and the method for after-treatment under a kind of framework based on Web |
CN103412961B (en) * | 2013-09-04 | 2017-03-29 | 广东全通教育股份有限公司 | A kind of real-time processing method and system for deriving Magnanimous Data Report form |
KR102103543B1 (en) * | 2013-11-28 | 2020-05-29 | 삼성전자 주식회사 | All-in-one data storage device having internal hardware filter, method thereof, and system having the data storage device |
CN106095775A (en) * | 2016-05-24 | 2016-11-09 | 中国银行股份有限公司 | A kind of method and system realizing data query or derivation |
CN106776829A (en) * | 2016-11-28 | 2017-05-31 | 成都广达新网科技股份有限公司 | A kind of data guiding system and its method of work |
-
2017
- 2017-11-01 CN CN201711059530.3A patent/CN107798111B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP2777009A4 (en) * | 2011-11-10 | 2015-06-17 | Microsoft Technology Licensing Llc | Export of content items from multiple, disparate content sources |
US8782101B1 (en) * | 2012-01-20 | 2014-07-15 | Google Inc. | Transferring data across different database platforms |
CN103092993A (en) * | 2013-02-18 | 2013-05-08 | 五八同城信息技术有限公司 | Data exporting method and data exporting device |
CN103793519A (en) * | 2014-02-14 | 2014-05-14 | 浪潮通信信息系统有限公司 | Automatic tool supporting exportation of mass data |
CN105740293A (en) * | 2014-12-12 | 2016-07-06 | 金蝶软件(中国)有限公司 | Data export method and device |
CN106407231A (en) * | 2015-08-03 | 2017-02-15 | 天脉聚源(北京)科技有限公司 | A data multi-thread export method and system |
Non-Patent Citations (1)
Title |
---|
"Java Excel SXSSFWorkbook大量数据导出";ZX星辰;《https://blog.csdn.net/zxxingchen/article/details/70159473》;20170413;第1页 * |
Also Published As
Publication number | Publication date |
---|---|
CN107798111A (en) | 2018-03-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107798111B (en) | Method for exporting data in large batch in distributed environment | |
CN107391653B (en) | Distributed NewSQL database system and picture data storage method | |
US9135647B2 (en) | Methods and systems for flexible and scalable databases | |
CN105824957A (en) | Query engine system and query method of distributive memory column-oriented database | |
US11132383B2 (en) | Techniques for processing database tables using indexes | |
CN110806933B (en) | Batch task processing method, device, equipment and storage medium | |
CN107301214B (en) | Data migration method and device in HIVE and terminal equipment | |
US9930113B2 (en) | Data retrieval via a telecommunication network | |
CN111400288A (en) | Data quality inspection method and system | |
CN108459913B (en) | Data parallel processing method and device and server | |
WO2021012861A1 (en) | Method and apparatus for evaluating data query time consumption, and computer device and storage medium | |
CN102750384A (en) | Device and method for acquiring data from multidatabase engine | |
CN112052082A (en) | Task attribute optimization method, device, server and storage medium | |
CN112000703B (en) | Data warehousing processing method and device, computer equipment and storage medium | |
CN103186384A (en) | Business-component-oriented software designing and analyzing system and using method thereof | |
CN112035230A (en) | Method and device for generating task scheduling file and storage medium | |
CN113407343A (en) | Service processing method, device and equipment based on resource allocation | |
CN115984022B (en) | Unified account checking method and device for distributed payment system | |
CN110222046B (en) | List data processing method, device, server and storage medium | |
AU2019241002B2 (en) | Transaction processing method and system, and server | |
CN111190704A (en) | Task classification processing method based on big data processing framework | |
CN106502842A (en) | Data reconstruction method and system | |
CN107562943B (en) | Data calculation method and system | |
CN105630997A (en) | Data parallel processing method, device and equipment | |
CN115617480A (en) | Task scheduling method, device and system and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |