CN115774745A - Extraction method and system for high-capacity Excel file data - Google Patents

Extraction method and system for high-capacity Excel file data Download PDF

Info

Publication number
CN115774745A
CN115774745A CN202111043514.1A CN202111043514A CN115774745A CN 115774745 A CN115774745 A CN 115774745A CN 202111043514 A CN202111043514 A CN 202111043514A CN 115774745 A CN115774745 A CN 115774745A
Authority
CN
China
Prior art keywords
data
excel
memory
database
extracted
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111043514.1A
Other languages
Chinese (zh)
Inventor
姚黎明
高灵超
王家凯
陈相舟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Big Data Center Of State Grid Corp Of China
Original Assignee
Big Data Center Of State Grid Corp Of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Big Data Center Of State Grid Corp Of China filed Critical Big Data Center Of State Grid Corp Of China
Priority to CN202111043514.1A priority Critical patent/CN115774745A/en
Publication of CN115774745A publication Critical patent/CN115774745A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a method and a system for extracting high-capacity Excel file data, wherein the method comprises the following steps: acquiring an Excel file to be extracted and a corresponding disk address; respectively constructing an Excel entity object and a data table used for storing read Excel data in a database according to the Excel file to be extracted and the corresponding disk address; reading data in Excel entity objects one by using easy Excel and recording the data to a memory, and persisting the data in the memory to a database through a persisted function based on MySQL JDBC to obtain extracted Excel file data; the Excel entity object comprises data needing to be extracted in an Excel file. The invention reduces the memory occupied by the high-capacity Excel file data when being written into the database by using the easy Excel and the persistence method based on MySQL JDBC, and improves the efficiency of data reading and the speed of writing into the database.

Description

Extraction method and system for high-capacity Excel file data
Technical Field
The invention relates to the technical field of data extraction, in particular to a method and a system for extracting high-capacity Excel file data.
Background
With the development of analysis and construction work of a power grid data center, how to efficiently access TB-level data generated during power grid operation is achieved by taking the data as a core and taking the data as a drive, the data can be used to the best, the data value is fully mined, and the method has important significance for promoting the service development of a power system and improving the management level.
At present, the main sources of the electric power big data include a data acquisition and monitoring control system, an energy management system, a power distribution management system, a market operation system and the like, data files generated by the systems during operation need to be integrated with scattered data through a data extraction technology and are brought into a unified electric power big data platform, and a high-efficiency data extraction technology can provide more reliable data support for platform electric power data analysis, so that the operation level of the electric power system is improved. The large-scale and various electric power data are mostly stored in Excel files in a recorded form, aiming at the extraction of the high-capacity Excel files, POI of Apache is mainly adopted for loading file data, and MyBatis is adopted for inserting the data loaded into the memory into a database in batch;
the Apache POI is a Java application program interface that provides the functionality to read and write Microsoft Office formatted documents. For reading and writing Excel files, apache POI not only has different reading and writing modes aiming at different Excel versions, but also has a processing strategy of generating memory overflow when reading and writing high-capacity Excel files; however, when reading a large-capacity Excel file, the Apache POI can prevent the memory overflow through the SAX mode, but still bring about very large memory consumption. If the decompression of the Excel version 07 and the storage after the decompression are finished in the memory, the indexes of all string information are stored in the memory, and about 100M of memory is still needed for the original 3M Excel POI SAX mode analysis;
moreover, the API of the SAX mode is too complicated, and the bottom layer structure of the Excel file needs to be known during use; two completely different sets of APIs are used for Excel files of different versions, and actual projects need to be realized respectively for different file types;
when a large amount of data is written into a database in batch, with the assistance of Mybatis, the condition that one database insert statement is executed every time one data is analyzed originally can be optimized to be the batch execution data insert statement, so that Mybatis can splice and encapsulate the database insert statements of a plurality of data, and realize the batch insertion of the data, but the time is still consumed by directly using the database insert statement.
Disclosure of Invention
The invention provides an extraction method for high-capacity Excel file data, which aims at the problems that in the prior art, the large-scale data of loaded Excel file entries are easy to generate memory overflow, complicated to analyze and time is excessively consumed when a database is inserted, and comprises the following steps:
acquiring an Excel file to be extracted and a corresponding disk address;
respectively constructing an Excel entity object and a data table used for storing read Excel data in a database according to the Excel file to be extracted and the corresponding disk address;
reading data in Excel entity objects one by using easy Excel and recording the data to a memory, and persisting the data in the memory to a database through a persisted function based on MySQL JDBC to obtain extracted Excel file data;
the Excel entity object comprises data needing to be extracted in an Excel file.
Preferably, the reading the data in the Excel entity object one by using easy Excel and recording the data in the memory, and persisting the data in the memory in a database through a persistency function based on MySQL JDBC to obtain the extracted Excel file data includes:
s1, reading and recording the data to be extracted into a memory one by using easy excel according to a set batch number, judging whether the number of the read data reaches the set batch number or not in real time, if so, stopping reading and entering S2, and if not, entering S3;
s2, using a persistence method based on MySQL JDBC to persist the set batch quantity of data in the memory into a database in batches, and returning to S1 after emptying the memory;
and S3, judging whether the quantity of the read data is the quantity of all the remaining data, if so, persisting the read data item by item to a data table for storing the read Excel data in a database by using a persistence method based on MySQL JDBC, and emptying the memory, otherwise, returning to S1.
Preferably, the persisting the data of the set batch number in the memory to the database in batch by using the persistency method based on MySQL JDBC, and emptying the memory includes:
establishing connection between the memory and the database through MySQL JDBC;
based on the connection between the memory and the database, converting the read byte stream corresponding to the set batch quantity of data into a character input stream by using a built-in statement setLocalInfileInputStream of MySQL JDBC;
and inserting the character input streams corresponding to the DATA of the set batch number into the database in batches by using a persistence function LOAD DATA LOCAL input statement, and emptying all DATA in the memory.
Preferably, the reading and recording the data to be extracted into the memory one by using easy excel according to the set batch number, and determining whether the number of the read data reaches the set batch number in real time includes:
reading the data to be extracted one by one into byte streams according to the set batch number by utilizing an SAX mode of easy excel, and then temporarily storing the byte streams into a byte array of a memory;
and based on the set batch number, utilizing the monitor to record and judge the number of the read data in real time.
Preferably, the determining whether the number of the read data is the number of the remaining total data includes:
and judging the quantity of the read data by utilizing a monitor according to the quantity of the remaining total data, wherein if the quantity of the read data is the same as the quantity of the remaining total data, the quantity of the read data is the remaining total data, otherwise, the quantity of the read data is different from the quantity of the remaining total data.
Preferably, the establishing of the Excel entity object and the data table for storing the read Excel data in the database according to the Excel file to be extracted and the corresponding disk address respectively comprises:
the Excel entity object is constructed according to the content items of the Excel file to be extracted, the data type and the service requirement of each item in the content items;
and taking the disk address information corresponding to the Excel file data to be extracted and the Excel file information to be extracted as the name of the data table, and taking the row head name in the Excel file as the field name of the data table to construct a data table for storing the read Excel data in a database.
Based on the same inventive concept, the invention also provides an extraction system for high-capacity Excel file data, which comprises the following steps:
the acquisition module is used for acquiring an Excel file to be extracted and a corresponding disk address;
the building module is used for building an Excel entity object and a data table used for storing read Excel data in a database according to the Excel file to be extracted and the corresponding disk address respectively;
the data extraction module is used for reading data in the Excel entity object one by using easy Excel, recording the data in the Excel entity object into a memory, and persisting the data in the memory into a database through a persisted function based on MySQL JDBC to obtain extracted Excel file data;
the Excel entity object comprises data needing to be extracted in an Excel file.
Preferably, the data extraction module includes:
the reading submodule is used for reading and recording the data to be extracted into the memory one by one according to the set batch number by using easy excel, and judging whether the number of the read data reaches the set batch number or not in real time, if so, stopping reading and performing data batch insertion according to the batch insertion submodule, otherwise, performing data batch insertion according to the strip-by-strip insertion submodule;
the batch insertion sub-module is used for carrying out batch persistence on the set batch quantity of data in the memory into a database by using a persistence method based on MySQL JDBC, and continuously reading the data according to the reading sub-module after emptying the memory;
and inserting sub-modules one by one for judging whether the number of the read data is the number of the residual total data, if so, utilizing a persistence method based on MySQL JDBC to persist the read data one by one to a data table used for storing the read Excel data in a database, and emptying the memory, otherwise, continuing to read the data according to the reading sub-modules.
Preferably, the batch insert sub-modules include:
the connection unit is used for establishing the connection between the memory and the database through MySQL JDBC;
the conversion unit is used for converting the read byte streams corresponding to the set batch quantity of data into character input streams by utilizing a built-in statement setLocalInfileeInputStream of MySQL JDBC based on the connection between the memory and the database;
and the DATA insertion unit is used for inserting the character input streams corresponding to the set batch number of DATA into the database in batches by using a persistence function LOAD DATA LOCAL INFILE statement and emptying all DATA in the memory.
Preferably, the reading submodule includes:
the reading unit is used for reading the data to be extracted into byte streams one by one according to the set batch number by utilizing an easy excel SAX mode and then temporarily storing the byte streams into a byte array of the memory;
and the monitoring unit is used for recording and judging the number of the read data in real time by using the monitor based on the set batch number.
Compared with the prior art, the invention has the beneficial effects that:
the invention provides a method and a system for extracting high-capacity Excel file data, wherein the method comprises the following steps: acquiring an Excel file to be extracted and a corresponding disk address; respectively constructing an Excel entity object and a data table used for storing read Excel data in a database according to the Excel file to be extracted and the corresponding disk address; reading data in Excel entity objects one by using easy Excel and recording the data to a memory, and persisting the data in the memory to a database through a persisted function based on MySQL JDBC to obtain extracted Excel file data; the Excel entity object comprises data needing to be extracted in an Excel file. The invention utilizes easy Excel and a persistence method based on MySQL JDBC to reduce the memory occupied by the data of the high-capacity Excel file when being written into the database, and improves the efficiency of data reading and the speed of writing into the database.
Drawings
FIG. 1 is a flow chart of an extraction method for high-capacity Excel file data according to the present invention;
FIG. 2 is a flow chart of data capacity adaptation processing in an embodiment of the present invention;
FIG. 3 is a flow chart illustrating a process for optimizing data write strategy according to an embodiment of the present invention;
FIG. 4 is a comparison of a conventional data extraction method and a method provided by the present invention;
FIG. 5 is a schematic diagram of a process for determining an Excel entity object in an embodiment of the present invention;
fig. 6 is a schematic diagram of an extraction system for high-capacity Excel file data according to the present invention.
Detailed Description
Example 1
Aiming at the problems that the conventional method for loading data is easy to generate memory overflow, complicated analysis and excessive time consumption for inserting into a database due to large scale of Excel record entries in the prior art when file extraction is performed, the invention provides an extraction method for high-capacity Excel file data, which is shown in figure 1 and comprises the following steps:
step 1, acquiring an Excel file to be extracted and a corresponding disk address;
step 2, establishing an Excel entity object and a data table used for storing read Excel data in a database according to the Excel file to be extracted and the corresponding disk address respectively;
step 3, reading data in the Excel entity object one by using easy Excel, recording the data in the Excel entity object into a memory, and persisting the data in the memory into a database through a persisted function based on MySQL JDBC to obtain extracted Excel file data;
the Excel entity object comprises data needing to be extracted in an Excel file.
In the step 1, obtaining an Excel file to be extracted and a corresponding disk address;
the method for acquiring the Excel file to be extracted mainly comprises the following steps: the Excel file comprises content items needing to be stored, the data type of each content item needing to be stored and the line head name of a cell.
In step 2, respectively constructing an Excel entity object and a data table used for storing read Excel data in a database according to the Excel file to be extracted and the corresponding disk address;
the Excel entity object is constructed according to the content item of the Excel file to be extracted, the data type of each content item and the service requirement;
the data table for storing the read Excel data in the database is constructed in the database by taking the disk address information corresponding to the Excel file data to be extracted and the Excel file information to be extracted as the name of the data table and taking the head name of a line in the Excel file as the field name of the data table;
in this embodiment, for convenience of processing, in the process of converting the line head name into the corresponding field name, the conversion from chinese to english or from english to english abbreviation may be performed according to a rule set in advance, and the field type stored in the data table may be determined according to the data types of different entries in the Excel cell.
In step 3, reading data in the Excel entity object one by using easy Excel and recording the data into a memory, and persisting the data in the memory into a data table for storing the read Excel data in a database through a persisted function based on MySQL JDBC to obtain extracted Excel file data;
in this embodiment, the file parsing order and the parsing order and rule of each Sheet work difficulty table may also be determined according to the service requirement.
The data in the memory is persisted to a data table for storing the read Excel data in a database through a persisted function based on MySQL JDBC, as shown in fig. 2, the specific steps are as follows:
s1, reading and recording data to be extracted to a memory one by using easy excel according to a set batch number, judging whether the number of the read data reaches the set batch number or not in real time, if so, stopping reading and entering S2, and if not, entering S3;
s2, performing batch persistence on the data of the set batch quantity in the memory to a database by using a persistence method based on MySQL JDBC, and returning to S1 after emptying the memory;
and S3, judging whether the quantity of the read data is the quantity of all the remaining data, if so, persisting the read data item by item to a data table for storing the read Excel data in a database by using a persistence method based on MySQL JDBC, and emptying the memory, otherwise, returning to S1.
In S2, batch persistence is performed on the data of the set batch number in the memory to a data table for storing read Excel data in a database by using a persistence method based on MySQL JDBC, and the memory is emptied, as shown in fig. 3, specifically including:
establishing connection between the memory and the database through MySQL JDBC;
based on the connection between the memory and the database, converting the read byte stream corresponding to the set batch quantity of data into a character input stream by using a built-in statement setLocalInfileInputStream of MySQL JDBC.
The invoke monitor in easy excel is adopted for judging whether the number of the read data reaches the set batch number in real time in S1 and judging whether the number of the read data is the number of the residual total data in S3.
After the Excel file is extracted, continuously extracting the next Excel file; and if no file extraction is required, closing the database connection.
The method for extracting the data of the high-capacity Excel file provided by the invention has the advantages that the easy Excel is used as an analysis tool of the high-capacity Excel file, and the easy Excel has capacity self-adaptive processing capability, so that the use of a memory can be reduced, the overflow of the memory is prevented, and the requirement of simple and convenient development is met; the method comprises the steps of analyzing a record in an Excel file line by depending on an SAX mode of Apache POI when Easyexcel analyzes 03 version of Excel, and depending on an SAX mode of the Apache POI after rewriting and optimization when analyzing 07 version of Excel, and informing and processing one or more lines of analysis results by using a monitor mode.
Batch DATA insertion is performed by using a LOAD DATA LOCAL INFILE statement based on MySQL JDBC, so that the DATA insertion speed is increased, and the DATA insertion time is reduced; the LOAD DATA LOCAL info file is faster than the execution of the Insert statement, and the DATA can be directly imported into the database from the memory by using the related built-in method setlocalnfileinputstream (), so that the DATA is prevented from being imported from the file after the file is written.
Compared with the traditional method that the Excel file byte stream read by the easy Excel is converted into the Excel file content in a character mode by using the easy Excel file, and then the character content is inserted into the database according to items by inserting the statement Insert into the database, the method can lead the byte stream read by the easy Excel into the database through the LOAD DATA LOCAL INFILE statement based on MySQL JDBC and the related method, thereby avoiding the conversion cost and greatly reducing the DATA extraction and storage time, as shown in FIG. 4.
Example 2
According to the extraction method for high-capacity Excel file data provided by the invention, taking an actually-to-be-extracted Excel file as an example, as shown in table 1, extraction operation is performed on the actually-to-be-extracted Excel file;
TABLE 1
Figure BDA0003250359530000071
Figure BDA0003250359530000081
Acquiring an Excel file to be extracted and a corresponding disk address;
respectively constructing an Excel entity object and a data table used for storing read Excel data in a database according to the Excel file to be extracted and the corresponding disk address;
in this embodiment, before reading by using easy Excel, an Excel entity object, that is, an Excel reading template, is first determined; the Excel entity object is constructed by storing the row data of the equipment name in the table 1 in the device _ name (String) variable, storing the row data of the failure time in the fault _ time (Date) variable in the table 1 and storing the row data of the failure number in the table 1 in the fault _ number (Double) variable, as shown in fig. 5;
in the construction of the data table, the name of the data table is named as the position information of an Excel file in a disk, namely 'D: \ equipment fault record.xlsx'; determining a data table structure by taking the field name of the data table as the head of a row of the data table, namely device _ name, fault _ time and fault _ number; because the data type of the first row is characters, the data type of the second row is DATEs, and the data type of the third row is numbers, the corresponding field types are VARCHAR (100), DATE and DOUBLE respectively.
Reading data in the Excel entity object one by using easy Excel and recording the data to a memory, and persisting the data record in the memory to a data table used for storing the read Excel data in a database through a persisted function based on MySQL JDBC to obtain extracted Excel file data;
s1, reading and recording data to be extracted to a memory one by using easy excel according to a set batch number, judging whether the number of the read data reaches the set batch number or not in real time, if so, stopping reading and entering S2, and if not, entering S3;
s2, performing batch persistence on the data of the set batch quantity in the memory to a database by using a persistence method based on MySQL JDBC, and returning to S1 after emptying the memory;
and S3, judging whether the quantity of the read data is the quantity of all the remaining data, if so, persisting the read data item by item to a data table for storing the read Excel data in a database by using a persistence method based on MySQL JDBC, and emptying the memory, otherwise, returning to S1.
In this embodiment, in the process of reading data by rows by using easy excel, when the number of read data reaches k by monitoring of the listener, the memory is emptied after waiting for the k data records in the memory to be subjected to persistence operation; and when the number of the read records does not reach k and the file reading is finished, emptying the memory after waiting for the data records in the memory to be subjected to the persistence operation.
And carrying out persistence operation on batch DATA by using a built-in statement based on MySQL JDBC and a LOAD DATA LOCAL INFILE statement, and storing the batch DATA into a database.
Establishing connection between the memory and the database through MySQL JDBC;
the loading of k pieces of DATA from the byte input stream into the DATA table that has been created by the MySQL database is achieved using the setlocalnfilelnputstream method and the LOAD DATA LOCAL inline statement.
The invention optimizes the DATA extraction method of the high-capacity Excel file by using the easy Excel and the persistence method based on the My SQL JDBC, applies an easy Excel frame when analyzing the high-capacity Excel file, and uses the LOAD DATA LOCAL FILE of the MySQL when inserting the analyzed DATA into the database in batches, thereby improving the efficiency of extracting the DATA of the high-capacity Excel file, writing the recorded DATA in the high-capacity Excel file into the unified power big DATA platform database more efficiently and conveniently, being used for monitoring and analyzing each service system of the power grid, better meeting the service requirements of the power grid and providing powerful DATA support for the construction and development of the power system.
Example 3
Based on the same inventive concept, the invention also provides an extraction system for high-capacity Excel file data, as shown in fig. 6, comprising:
the acquisition module is used for acquiring the Excel file to be extracted and the corresponding disk address;
the building module is used for building Excel entity objects and data tables used for storing read Excel data in the database according to the Excel files to be extracted and the corresponding disk addresses respectively;
the data extraction module is used for reading data in the Excel entity object one by using easy Excel, recording the data in the Excel entity object into a memory, and persisting the data in the memory into a database through a persisted function based on MySQL JDBC to obtain extracted Excel file data;
the Excel entity object comprises data needing to be extracted in an Excel file.
The system comprises an acquisition module, a storage module and a storage module, wherein the acquisition module is used for acquiring an Excel file to be extracted, content items required to be stored in the Excel file, the data type of each content item required to be stored, the row head name of a cell and a corresponding disk address;
a build module comprising: an entity object construction submodule and a data table construction submodule;
the entity object construction sub-module is used for constructing an entity object corresponding to the Excel file to be extracted according to the content items of the Excel file to be extracted, the data type of each content item and the service requirement;
and the data table construction sub-module is used for taking the disk address information corresponding to the Excel file data to be extracted and the Excel file information to be extracted as the name of the data table, and taking the line head name in the Excel file as the field name of the data table to construct a data table for storing the read Excel data in the database.
A data extraction module comprising: reading sub-modules and inserting the sub-modules in batches;
the reading submodule is used for reading and recording data to be extracted into the memory one by one according to the set batch quantity by using easy excel, judging whether the quantity of the read data reaches the set batch quantity or not in real time, stopping reading and performing data batch insertion according to the batch insertion submodule if the quantity of the read data reaches the set batch quantity, and otherwise performing data batch insertion according to the strip-by-strip insertion submodule;
the batch insertion submodule is used for carrying out batch persistence on the data of the set batch quantity in the memory into a database by using a persistence method based on MySQL JDBC, and continuously reading the data according to the reading submodule after the memory is emptied;
wherein, insert submodule piece in batches includes: a connection unit and a conversion unit;
the connection unit is used for establishing the connection between the memory and the database through MySQL JDBC;
a conversion unit, configured to convert the read byte stream corresponding to the set batch number of data into a character input stream by using a built-in statement setlocalnfillnputstream of MySQL JDBC based on the connection between the memory and the database;
and the DATA inserting unit is used for inserting the character input streams corresponding to the set batch number of DATA into the database in batches by using a persistence function LOAD DATA LOCAL file statement and emptying all DATA in the memory.
And inserting sub-modules one by one for judging whether the number of the read data is the number of the residual total data, if so, utilizing a persistence method based on MySQL JDBC to persist the read data one by one to a data table used for storing the read Excel data in a database, and emptying the memory, otherwise, continuing to read the data according to the reading sub-modules.
Wherein, read the submodule, include: reading unit and monitoring unit one by one;
the reading unit is used for reading the data to be extracted into byte streams one by one according to the set batch number by utilizing an easy excel SAX mode and then temporarily storing the byte streams into a byte array of the memory;
and the monitoring unit is used for recording and judging the number of the read data in real time by using the monitor based on the set batch number.
After the Excel file is extracted by the data extraction module, continuously extracting the next Excel file; and if no file extraction is required, closing the database connection.
It is to be understood that the embodiments described are only a few embodiments of the present invention, and not all embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The present invention is not limited to the above embodiments, and any modifications, equivalent replacements, improvements, etc. made within the spirit and principle of the present invention are included in the scope of the claims of the present invention which are filed as the application.

Claims (10)

1. A method for extracting high-capacity Excel file data is characterized by comprising the following steps:
acquiring an Excel file to be extracted and a corresponding disk address;
respectively constructing an Excel entity object and a data table used for storing read Excel data in a database according to the Excel file to be extracted and the corresponding disk address;
reading data in Excel entity objects one by using easy Excel and recording the data to a memory, and persisting the data in the memory to a database through a persisted function based on MySQL JDBC to obtain extracted Excel file data;
the Excel entity object comprises data needing to be extracted in an Excel file.
2. The method according to claim 1, wherein the step of reading the data in the Excel entity object by each strip by using easy Excel and recording the data in the memory, and the step of persisting the data in the memory in a database through a persisted function based on MySQL JDBC to obtain the extracted Excel file data comprises the steps of:
s1, reading and recording the data to be extracted into a memory one by using easy excel according to a set batch number, judging whether the number of the read data reaches the set batch number or not in real time, if so, stopping reading and entering S2, and if not, entering S3;
s2, using a persistence method based on MySQL JDBC to persist the set batch quantity of data in the memory into a database in batches, and returning to S1 after emptying the memory;
and S3, judging whether the quantity of the read data is the quantity of all the remaining data, if so, persisting the read data item by item to a data table for storing the read Excel data in a database by using a persistence method based on MySQL JDBC, and emptying the memory, otherwise, returning to S1.
3. The method according to claim 2, wherein the persisting the set batch amount of data in the memory into the database in batches by using a persisting method based on MySQL JDBC, and emptying the memory comprises:
establishing connection between the memory and the database through MySQL JDBC;
based on the connection between the memory and the database, converting the read byte stream corresponding to the set batch quantity of data into a character input stream by using a built-in statement setLocalInfileInputStream of MySQL JDBC;
and inserting the character input streams corresponding to the DATA of the set batch number into the database in batches by using a persistence function LOAD DATA LOCAL input statement, and emptying all DATA in the memory.
4. The method according to claim 2, wherein the step of reading and recording the data to be extracted into the memory one by using easy excel according to the set batch number, and determining whether the number of the read data reaches the set batch number in real time comprises:
reading the data to be extracted one by one into byte streams according to the set batch number by utilizing an SAX mode of easy excel, and then temporarily storing the byte streams into a byte array of a memory;
and based on the set batch number, utilizing the monitor to record and judge the number of the read data in real time.
5. The method of claim 2, wherein determining whether the amount of read data is the amount of remaining total data comprises:
and judging the quantity of the read data according to the quantity of the remaining total data by using a monitor, wherein if the quantity of the read data is the same as the quantity of the remaining total data, the quantity of the read data is the remaining total data, and otherwise, the quantity of the read data is different from the quantity of the remaining total data.
6. The method according to claim 1, wherein the step of constructing Excel entity objects and data tables for storing read Excel data in a database according to the Excel file to be extracted and the corresponding disk addresses respectively comprises the following steps:
the Excel entity object is constructed according to the content items of the Excel file to be extracted, the data type and the service requirement of each item in the content items;
and taking the disk address information corresponding to the Excel file data to be extracted and the Excel file information to be extracted as the name of the data table, and taking the row head name in the Excel file as the field name of the data table to construct a data table for storing the read Excel data in a database.
7. An extraction system for high-capacity Excel file data is characterized by comprising the following components:
the acquisition module is used for acquiring the Excel file to be extracted and the corresponding disk address;
the building module is used for building an Excel entity object and a data table used for storing read Excel data in a database according to the Excel file to be extracted and the corresponding disk address respectively;
the data extraction module is used for reading data in the Excel entity object one by using easy Excel, recording the data in the Excel entity object into a memory, and persisting the data in the memory into a database through a persisted function based on MySQL JDBC to obtain extracted Excel file data;
the Excel entity object comprises data needing to be extracted in an Excel file.
8. The system of claim 7, wherein the data extraction module comprises:
the reading submodule is used for reading and recording the data to be extracted into the memory one by one according to the set batch number by using easy excel, and judging whether the number of the read data reaches the set batch number or not in real time, if so, stopping reading and performing data batch insertion according to the batch insertion submodule, otherwise, performing data batch insertion according to the strip-by-strip insertion submodule;
the batch insertion submodule is used for carrying out batch persistence on the set batch quantity of data in the memory into a database by using a persistence method based on MySQL JDBC, and continuously reading the data according to the reading submodule after the memory is emptied;
and inserting sub-modules one by one for judging whether the number of the read data is the number of the residual total data, if so, utilizing a persistence method based on MySQL JDBC to persist the read data one by one to a data table used for storing the read Excel data in a database, and emptying the memory, otherwise, continuing to read the data according to the reading sub-modules.
9. The system of claim 8, wherein the batch insert sub-modules comprise:
the connection unit is used for establishing the connection between the memory and the database through MySQL JDBC;
the conversion unit is used for converting the read byte streams corresponding to the set batch quantity of data into character input streams by utilizing a built-in statement setLocalInfileeInputStream of MySQL JDBC based on the connection between the memory and the database;
and the DATA inserting unit is used for inserting the character input streams corresponding to the set batch number of DATA into the database in batches by using a persistence function LOAD DATA LOCAL file statement and emptying all DATA in the memory.
10. The system of claim 8, wherein the read submodule comprises:
the reading unit is used for reading the data to be extracted into byte streams one by one according to the set batch number by utilizing an easy excel SAX mode and then temporarily storing the byte streams into a byte array of the memory;
and the monitoring unit is used for recording and judging the quantity of the read data in real time by utilizing the monitor based on the set batch quantity.
CN202111043514.1A 2021-09-07 2021-09-07 Extraction method and system for high-capacity Excel file data Pending CN115774745A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111043514.1A CN115774745A (en) 2021-09-07 2021-09-07 Extraction method and system for high-capacity Excel file data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111043514.1A CN115774745A (en) 2021-09-07 2021-09-07 Extraction method and system for high-capacity Excel file data

Publications (1)

Publication Number Publication Date
CN115774745A true CN115774745A (en) 2023-03-10

Family

ID=85387610

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111043514.1A Pending CN115774745A (en) 2021-09-07 2021-09-07 Extraction method and system for high-capacity Excel file data

Country Status (1)

Country Link
CN (1) CN115774745A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115983222A (en) * 2023-03-20 2023-04-18 苏州清研精准汽车科技有限公司 EasyExcel-based file data reading method, device, equipment and medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115983222A (en) * 2023-03-20 2023-04-18 苏州清研精准汽车科技有限公司 EasyExcel-based file data reading method, device, equipment and medium

Similar Documents

Publication Publication Date Title
WO2020233330A1 (en) Batch testing method, apparatus, and computer-readable storage medium
CN102831052B (en) Test exemple automation generating apparatus and method
CN103309904B (en) A kind of method and device generating data warehouse ETL code
CN104866426A (en) Software test integrated control method and system
CN109471851B (en) Data processing method, device, server and storage medium
CN106055618B (en) Data processing method based on web crawler and structured storage
CN101859303A (en) Metadata management method and management system
CN112231407B (en) DDL synchronization method, device, equipment and medium of PostgreSQL database
CN111400387A (en) Conversion method and device for import and export data, terminal equipment and storage medium
CN115774745A (en) Extraction method and system for high-capacity Excel file data
CN111858730A (en) Data importing and exporting device, method, equipment and medium of graph database
CN111400169A (en) Method and system for automatically generating netlist file for testing software and hardware
CN112597171A (en) Table relation visualization method and device, electronic equipment and storage medium
CN108388589B (en) Device for automatically generating sql query statement of database
CN116955393A (en) Data processing method and device, electronic equipment and storage medium
CN114115831A (en) Data processing method, device, equipment and storage medium
CN114461712A (en) Method and device for importing and exporting multi-source heterogeneous data source and graph database
CN102750290B (en) Realizing method of STB (Set Top Box) database and STB
CN110806963A (en) Example information monitoring and visual display method based on wave cloud database
CN112231376A (en) Method and device for offline data acquisition
CN106909570B (en) Data conversion method and device
CN115203132B (en) Design method for rapidly extracting OP2file architecture according to requirements
CN110737515B (en) Processing method of data task instruction, computer equipment and storage medium
CN103309678A (en) Method for processing data in combined environment of IBM mainframe
CN111984671B (en) Method for generating view script

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination