CN115730572A - Method and device for processing Excel based on Python and electronic equipment - Google Patents

Method and device for processing Excel based on Python and electronic equipment Download PDF

Info

Publication number
CN115730572A
CN115730572A CN202211581621.4A CN202211581621A CN115730572A CN 115730572 A CN115730572 A CN 115730572A CN 202211581621 A CN202211581621 A CN 202211581621A CN 115730572 A CN115730572 A CN 115730572A
Authority
CN
China
Prior art keywords
excel
file
excel file
python
reading
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211581621.4A
Other languages
Chinese (zh)
Inventor
陈旭
施甘图
庭治宏
张寒雪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Lahuobao Network Technology Co ltd
Hongtu Intelligent Logistics Co ltd
Original Assignee
Lahuobao Network Technology Co ltd
Hongtu Intelligent Logistics Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lahuobao Network Technology Co ltd, Hongtu Intelligent Logistics Co ltd filed Critical Lahuobao Network Technology Co ltd
Priority to CN202211581621.4A priority Critical patent/CN115730572A/en
Publication of CN115730572A publication Critical patent/CN115730572A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention belongs to the technical field of data sheet application, and relates to a method, a device and electronic equipment for processing Excel based on Python, wherein the method comprises the following steps: creating an Excel file storage path, and putting an Excel file into the Excel file storage path; acquiring file names of all Excel files in an Excel file storage path; circularly reading the Excel file; reading the content of the table by utilizing a Python tool; carrying out keyword matching on the table, and reading an Excel file containing keywords; processing an Excel file; and loading the data into the database in a mode of Python linking the data warehouse database. The method replaces manual work to arrange disordered Excel tables orderly by a Python tool and then leads the ordered Excel tables into the database, so that the speed and the accuracy of Excel data table processing are improved.

Description

Python-based Excel processing method and device and electronic equipment
Technical Field
The invention belongs to the technical field of data sheet application, and particularly relates to a method and a device for processing Excel based on Python and electronic equipment.
Background
The big data industry develops rapidly, and in order to integrate scattered, disordered and non-uniform data and generate value of the data, the ETL (the process of loading the data into a data warehouse after extraction, cleaning and conversion) plays a key role. In the ETL process, various business system databases of an enterprise are usually processed, and the process is completed through various data processing tools, but some data come from offline tables, or some tables which have no external interface and can only be downloaded manually, and the tables can be changed continuously due to various reasons, which causes much trouble to the data processing process, and the data cannot be organized and stored in an ordered and structured manner.
Disclosure of Invention
In order to solve the problem that messy Excel table data can only be manually arranged in the traditional mode, the invention provides a method, a device and electronic equipment for processing Excel based on Python.
In a first aspect, the invention provides a method for processing Excel based on Python, comprising the following steps:
creating an Excel file storage path, and placing an Excel file into the Excel file storage path;
acquiring file names of all Excel files in the Excel file storage path;
circularly reading the Excel file;
reading the table content according to the Excel file storage path and the file name of the Excel file by utilizing a Python tool;
performing keyword matching on the table, and reading the Excel file containing the keywords;
processing the Excel file: acquiring a form header in a form keyword acquiring manner, deleting contents of the form header which do not accord with set conditions, setting field types of the form contents, screening the file field contents to remove rows in the form, and supplementing the form with field missing in a filling manner;
and loading the data into the database in a mode of Python linking the data warehouse database.
In a second aspect, the invention provides a Python-based Excel processing device, which comprises a creating unit, an obtaining unit, a first reading unit, a second reading unit, a matching unit, a processing unit and a loading unit;
the creating unit is used for creating an Excel file storage path and placing an Excel file into the Excel file storage path;
the obtaining unit is used for obtaining the file names of all the Excel files in the Excel file storage path;
the first reading unit is used for circularly reading the Excel file;
the second reading unit is used for reading the content of the table according to the Excel file storage path and the file name of the Excel file by utilizing a Python tool;
the matching unit is used for matching keywords of the table and reading the Excel file containing the keywords;
the processing unit is used for processing the Excel file: acquiring a form header in a form keyword acquiring manner, deleting contents of the form header which do not accord with set conditions, setting field types of the form contents, screening the file field contents to remove rows in the form, and supplementing the form with field missing in a filling manner;
and the loading unit is used for loading the data into the database in a mode of linking the data warehouse with the Python database.
In a third aspect, the present invention discloses an electronic device, comprising:
a processor and a memory;
the memory is used for storing computer operation instructions;
the processor is used for executing the method of any one of claims 1 to 6 by calling the computer operation instruction.
The invention has the beneficial effects that: the method replaces manual work to orderly arrange the disordered Excel data table by using a Python tool, and then the disordered Excel data table is imported into the database, so that the speed and the accuracy of Excel data table processing are improved.
On the basis of the technical scheme, the invention can be further improved as follows.
Further, creating an Excel file storage path, and placing an Excel file into the Excel file storage path, including:
establishing a new Excel file storage path for storing the Excel file in a server;
sending the Excel file to a public folder;
copying the target Excel file in the public folder to the Excel file storage path of the server by using a shell script.
Further, circularly reading the Excel file, including:
newly building a Python file;
acquiring all file names under the Excel file storage path;
reading the Excel file in a circulating file name mode, or reading the Excel file containing the set keywords in a keyword setting mode;
further, reading the content of a table by using a Python tool according to the Excel file storage path and the file name of the Excel file, wherein the reading comprises the following steps: and reading the table content according to the Excel file storage path and the file name of the Excel file by using a table reading function in the Python file.
Further, when the Excel file is processed, table text head-up keywords are matched, and the line number of the table text head-up is obtained; deleting the contents of which the table text heads up and do not accord with the set conditions; setting header parameters through a table reading function in the Python file to acquire data of a table again; when the table reading function in the Python file is used for reading the table content, setting the field type of the table content, and setting the numerical value type of each column of tables; selecting data by adopting a field content screening mode; and when the table field is missing, replacing the field of the numerical value type by using a set value, and filling in the field by using a line filling mode.
Further, the table content is loaded into the database in a mode of linking the data warehouse by Python, and the table content after processing is loaded into the database in a mode of linking the data warehouse by Python or written into the Excel file by a write-in function in the Python file.
Drawings
Fig. 1 is a flowchart of a method for processing Excel based on Python according to embodiment 1 of the present invention;
fig. 2 is a schematic diagram of a Python-based Excel processing apparatus according to embodiment 2 of the present invention;
fig. 3 is a schematic diagram of an electronic device according to embodiment 3 of the present invention.
Icon: 30-an electronic device; 310-a processor; 320-a bus; 330-a memory; 340-transceiver.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.
Example 1
As an embodiment, as shown in fig. 1, to solve the above technical problem, this embodiment provides a method for processing Excel based on Python, including:
creating an Excel file storage path, and putting an Excel file into the Excel file storage path;
acquiring file names of all Excel files in an Excel file storage path;
circularly reading the Excel file;
reading the content of the table by utilizing a Python tool according to the Excel file storage path and the file name of the Excel file;
carrying out keyword matching on the table, and reading an Excel file containing keywords;
processing an Excel file: acquiring a form header in a form keyword acquiring mode, deleting the content of the form header which does not accord with set conditions, setting the field type of the form content, screening the file field content to remove rows in the form, and supplementing the form with field loss in a filling mode;
and loading the data into the database in a mode of Python linking the data warehouse database.
Optionally, creating an Excel file storage path, and placing the Excel file into the Excel file storage path, including:
establishing an Excel file storage path for storing an Excel file in a server;
sending the Excel file to a public folder;
and copying the target Excel file in the public folder to an Excel file storage path of the server by using the shell script.
Optionally, reading the Excel file in a cycle includes:
newly building a Python file;
acquiring all file names under an Excel file storage path;
reading an Excel file in a circulating file name mode, or reading the Excel file containing set keywords in a keyword setting mode;
optionally, reading the content of the table by using a Python tool according to the Excel file storage path and the file name of the Excel file, where the reading includes: and reading the table content according to the Excel file storage path and the file name of the Excel file by using a table reading function in the Python file.
In an actual application process, the read _ Excel function, which is a table reading function of pandas in a Python file, is used for reading the obtained Excel file storage path and the file name of the Excel file, so that the content of the table is read.
Optionally, when the Excel file is processed, matching the text head-up keywords of the table to obtain the line number of the text head-up of the table; deleting the contents of which the table text heads up and do not accord with the set conditions; setting header parameters through a table reading function in a Python file to obtain data of the table again; when reading the table content by using a table reading function in a Python file, setting the field type of the table content and setting the numerical value type of each column of tables; selecting data by adopting a field content screening mode; and when the table field is missing, replacing the field of the numerical value type by using a set value, and filling in the field by using a line filling mode.
In the practical application process, because the offline Excel table or the downloaded service table may not be a standard one-dimensional table, the table content cannot be directly set from the first line, and the accurate line number of the text table heading is obtained by matching the text heading keyword of the table.
The data of the Excel table is obtained again by using a table reading function in a Python file, namely a read _ Excel function, to set a specific line number of a header parameter header, similarly, the content which does not meet the condition at the end of the Excel table is excluded by a screening condition, the field of the content which does not meet the condition should not be empty but appears empty, or an error field exists in the field of the content of the Excel table, the error field is screened out by a screening mode and deleted, or a line or a column for data summarization exists in the Excel table (whether a line or a column for data summarization exists is searched and judged by a keyword screening mode on the content at the end of the table), and the line or the column belongs to redundant lines or columns, and then the line or the column is deleted by the screening mode. When the table reading function read _ Excel function is used for reading Excel table contents, the numerical value type of each column can be specified by setting the field type dtype of the parameter table contents, and the distortion of the table content field is avoided. When a table with fields missing contents is encountered, data completion is required according to conditions, such as: if the field is of value type, a null value may be replaced with 0; since the fields of the rows have the same content, but in the original table, the first row or the last row of the rows has data content, and the other rows are empty, which needs to be filled up by using a downward filling or upward filling manner.
Optionally, the table content is loaded into the database in a way that a Python links a data warehouse, and the table content after processing is loaded into the database in a way that the Python links the data warehouse, or is written into the Excel file through a write-in function in a Python file.
The method replaces manual work to orderly arrange the disordered Excel data table by using a Python tool, and then the disordered Excel data table is imported into the database, so that the speed and the accuracy of Excel data table processing are improved.
Example 2
Based on the same principle as the method shown in embodiment 1 of the present invention, as shown in fig. 2, an embodiment of the present invention further provides a Python Excel processing-based device, which includes a creating unit, an obtaining unit, a first reading unit, a second reading unit, a matching unit, a processing unit, and a loading unit;
the device comprises a creating unit, a storing unit and a processing unit, wherein the creating unit is used for creating an Excel file storage path and putting an Excel file into the Excel file storage path;
the obtaining unit is used for obtaining the file names of all Excel files under the Excel file storage path;
the first reading unit is used for circularly reading the Excel file;
the second reading unit is used for reading the content of the table according to the Excel file storage path and the file name of the Excel file by utilizing a Python tool;
the matching unit is used for matching keywords of the table and reading an Excel file containing the keywords;
the processing unit is used for processing the Excel file: acquiring a form header in a form keyword acquiring mode, deleting contents of the form header which do not accord with set conditions, setting field types of the form contents, screening the file field contents to remove rows in the form, and supplementing the form with field loss in a filling mode;
and the loading unit is used for loading the data into the database in a mode of linking the data warehouse with the Python.
Optionally, creating an Excel file storage path, and placing the Excel file into the Excel file storage path, includes:
establishing an Excel file storage path for storing an Excel file in a server;
sending the Excel file to a public folder;
and copying the target Excel file in the public folder to an Excel file storage path of the server by using the shell script.
Optionally, reading the Excel file in a cycle includes:
newly building a Python file;
acquiring all file names under an Excel file storage path;
reading an Excel file in a circulating file name mode, or reading the Excel file containing set keywords in a keyword setting mode;
optionally, reading the content of the table by using a Python tool according to the Excel file storage path and the file name of the Excel file, where the reading includes: and reading the table content according to the Excel file storage path and the file name of the Excel file by using a table reading function in the Python file.
Optionally, when the Excel file is processed, matching the text head-up keywords of the table to obtain the line number of the text head-up of the table; deleting the contents of which the table text head-up does not accord with the set conditions; setting header parameters through a table reading function in a Python file to acquire data of the table again; when reading the table content by using a table reading function in a Python file, setting the field type of the table content and setting the numerical value type of each column of tables; selecting data by adopting a field content screening mode; and when the table field is missing, replacing the field of the numerical value type by using a set value, and filling in the field by using a line filling mode.
Optionally, the method includes loading the processed table content into the database in a way of Python linking the data warehouse, and loading the processed table content into the database in a way of Python linking the data warehouse or writing the processed table content into an Excel file through a write-in function in a Python file.
Example 3
Based on the same principle as the method shown in the embodiment of the present invention, an embodiment of the present invention further provides an electronic device, as shown in fig. 3, which may include but is not limited to: a processor and a memory; a memory for storing a computer program; a processor for executing the method according to any of the embodiments of the present invention by calling a computer program.
In an alternative embodiment, an electronic device is provided, the electronic device 30 shown in fig. 3 comprising: a processor 310 and a memory 330. Wherein the processor 310 is coupled to the memory 330, such as via a bus 320.
Optionally, the electronic device 30 may further include a transceiver 340, and the transceiver 340 may be used for data interaction between the electronic device and other electronic devices, such as transmission of data and/or reception of data. It should be noted that the transceiver 340 is not limited to one in practical application, and the structure of the electronic device 30 does not limit the embodiment of the present invention.
The processor 310 may be a CPU central processing unit, general processor, DSP data signal processor, ASIC application specific integrated circuit, FPGA field programmable gate array or other programmable logic device, hardware component, or any combination thereof. The processor 310 may also be a combination of computing functions, e.g., comprising one or more microprocessors, a combination of a DSP and a microprocessor, or the like.
Bus 320 may include a path that transfers information between the above-described components. Bus 320 may be a PCI peripheral component interconnect standard bus or an EISA extended industry standard architecture bus, or the like. The bus 320 may be divided into a control bus, a data bus, an address bus, etc. For ease of illustration, only one thick line is shown in FIG. 3, but this does not mean only one bus or one type of bus.
Memory 330 may be, but is not limited to, a ROM read-only memory or other type of static storage device that may store static information and instructions, a RAM random access memory or other type of dynamic storage device that may store information and instructions, an EEPROM electrically erasable programmable read-only memory, a CD-ROM read-only disk or other optical disk storage, optical disk storage (including optical disks, laser disks, compact disks, digital versatile disks, etc.), magnetic disk storage media, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.
The memory 330 is used for storing application program codes (computer programs) for performing aspects of the present invention and is controlled to be executed by the processor 310. The processor 310 is configured to execute application program code stored in the memory 330 to implement the aspects illustrated in the foregoing method embodiments.
The present invention has been described in terms of the preferred embodiment, and it is not intended to be limited to the embodiment. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (8)

1. The method for processing Excel based on Python is characterized by comprising the following steps:
creating an Excel file storage path, and placing an Excel file into the Excel file storage path;
acquiring file names of all Excel files in the Excel file storage path;
circularly reading the Excel file;
reading the content of a table by utilizing a Python tool according to the Excel file storage path and the file name of the Excel file;
performing keyword matching on the table, and reading the Excel file containing the keywords;
processing the Excel file: acquiring a form header in a form keyword acquiring manner, deleting contents of the form header which do not accord with set conditions, setting field types of the form contents, screening the file field contents to remove rows in the form, and supplementing the form with field missing in a filling manner;
and loading the data into the database in a mode of Python linking the data warehouse database.
2. The Python-based Excel processing method according to claim 1, wherein creating an Excel file storage path and placing an Excel file into the Excel file storage path comprises:
establishing a new Excel file storage path for storing the Excel file in a server;
sending the Excel file to a public folder;
copying the target Excel file in the public folder to the Excel file storage path of the server by using a shell script.
3. The method for processing Excel based on Python of claim 1, wherein the circularly reading the Excel file comprises:
newly building a Python file;
acquiring all file names under the Excel file storage path;
reading the Excel file in a circulating file name mode, or reading the Excel file containing the set keywords in a set keyword mode.
4. The method for processing Excel according to claim 1, wherein reading the contents of a table by using a Python tool according to the Excel file storage path and the file name of the Excel file comprises: and reading the table content according to the Excel file storage path and the file name of the Excel file by using a table reading function in the Python file.
5. The method for processing Excel based on Python according to claim 1, wherein during processing the Excel file, table body heading keywords are matched to obtain the number of lines of table body heading; deleting the contents of which the table text heads up and do not accord with the set conditions; setting header parameters through a table reading function in the Python file to acquire data of a table again; when the table reading function in the Python file is used for reading the table content, setting the field type of the table content, and setting the numerical value type of each column of tables; selecting data by adopting a field content screening mode; when the table field is missing, the field of the numerical value type is replaced by the set value, and the filling is carried out by using a line filling mode.
6. The method according to claim 1, wherein the Excel is loaded into the database by way of Python linking to the data warehouse, and further comprising loading the processed table contents into the database by way of Python linking to the data warehouse or writing the processed table contents into the Excel file by way of a write function in the Python file.
7. The device for processing Excel based on Python is characterized by comprising a creating unit, an obtaining unit, a first reading unit, a second reading unit, a matching unit, a processing unit and a loading unit;
the creating unit is used for creating an Excel file storage path and placing an Excel file into the Excel file storage path;
the obtaining unit is used for obtaining the file names of all the Excel files in the Excel file storage path;
the first reading unit is used for circularly reading the Excel file;
the second reading unit is used for reading the content of the table according to the Excel file storage path and the file name of the Excel file by utilizing a Python tool;
the matching unit is used for matching keywords with a table and reading the Excel file containing the keywords;
the processing unit is used for processing the Excel file: acquiring a form header in a form keyword acquiring manner, deleting contents of the form header which do not accord with set conditions, setting field types of the form contents, screening the file field contents to remove rows in the form, and supplementing the form with field missing in a filling manner;
and the loading unit is used for loading the data into the database in a mode of linking the data warehouse with the Python database.
8. An electronic device, comprising:
a processor and a memory;
the memory is used for storing computer operation instructions;
the processor is used for executing the method of any one of claims 1 to 6 by calling the computer operation instruction.
CN202211581621.4A 2022-12-09 2022-12-09 Method and device for processing Excel based on Python and electronic equipment Pending CN115730572A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211581621.4A CN115730572A (en) 2022-12-09 2022-12-09 Method and device for processing Excel based on Python and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211581621.4A CN115730572A (en) 2022-12-09 2022-12-09 Method and device for processing Excel based on Python and electronic equipment

Publications (1)

Publication Number Publication Date
CN115730572A true CN115730572A (en) 2023-03-03

Family

ID=85300969

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211581621.4A Pending CN115730572A (en) 2022-12-09 2022-12-09 Method and device for processing Excel based on Python and electronic equipment

Country Status (1)

Country Link
CN (1) CN115730572A (en)

Similar Documents

Publication Publication Date Title
EP3353672B1 (en) Method and apparatus for transferring data between databases
US7222142B2 (en) Methods and systems for moving data objects utilizing data identifiers and lock objects
CN111339041A (en) File parsing and warehousing and file generating method and device
CN107391544B (en) Processing method, device and equipment of column type storage data and computer storage medium
CN106909595B (en) Data migration method and device
CN111414362A (en) Data reading method, device, equipment and storage medium
CN114297204A (en) Data storage and retrieval method and device for heterogeneous data source
CN101388018A (en) Computer aided design document management method
CN115730572A (en) Method and device for processing Excel based on Python and electronic equipment
CN109388685B (en) Method and device for warehousing spatial data used by planning industry
CN114676126B (en) Database-based data verification method, device, equipment and storage medium
CN113791594B (en) Configuration establishing method, system, equipment and medium of distributed control system
CN114185934B (en) Indexing and query method and system based on Tiandun database column storage
CN110888874B (en) Device and method for creating form through Excel and storage medium
CN114330278A (en) Method and equipment for judging consistency of returned data
CN117931755B (en) Method, device, equipment and medium for batch import of package library
CN114428776A (en) Index partition management method and system for time sequence data
CN111339390A (en) Method, computing device and storage medium for crawling information based on fixed-line telephone
CN111427868B (en) Processing method and device for operation request in database migration and electronic equipment
CN117591495B (en) Data storage method and system for nuclear power historical data migration
CN117827802B (en) Hbase-based data processing system, method and device
US20050234918A1 (en) Correction server for large database systems
CN109857302B (en) Electronic book information repairing method, electronic equipment and computer storage medium
CN111949608B (en) Method and device for searching text in APP
CN116774906A (en) Disk data deleting method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination