CN116450655A - Tree structure data processing method and device, electronic equipment and storage medium - Google Patents

Tree structure data processing method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN116450655A
CN116450655A CN202310699380.1A CN202310699380A CN116450655A CN 116450655 A CN116450655 A CN 116450655A CN 202310699380 A CN202310699380 A CN 202310699380A CN 116450655 A CN116450655 A CN 116450655A
Authority
CN
China
Prior art keywords
data
event
processor
node
tree structure
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310699380.1A
Other languages
Chinese (zh)
Inventor
吴亚军
韩涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Xumi Yuntu Space Technology Co Ltd
Original Assignee
Shenzhen Xumi Yuntu Space Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Xumi Yuntu Space Technology Co Ltd filed Critical Shenzhen Xumi Yuntu Space Technology Co Ltd
Priority to CN202310699380.1A priority Critical patent/CN116450655A/en
Publication of CN116450655A publication Critical patent/CN116450655A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2246Trees, e.g. B+trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/80Information retrieval; Database structures therefor; File system structures therefor of semi-structured data, e.g. markup language structured data such as SGML, XML or HTML
    • G06F16/81Indexing, e.g. XML tags; Data structures therefor; Storage structures
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a tree structure data processing method, a tree structure data processing device, electronic equipment and a storage medium. The method comprises the following steps: creating an OPC package based on the file stream, creating a table reader object, and extracting data in the OPC package by using the table reader object; creating a data analyzer, setting an event processor for the data analyzer, and sending parameters of the event processor to an event-driven processor; analyzing the worksheet data according to the event triggering event processor, and storing the analyzed data into the event processor; traversing the catalogue and the content data in the data list, screening out root nodes, and determining the sequence of the root nodes of the tree structure; traversing the root node by using the outer layer circulation, traversing the full data by using the inner layer circulation to obtain non-leaf nodes or leaf nodes, and repeatedly executing the inner layer circulation for each non-leaf node until the leaf node is found, thereby obtaining tree structure data. The method and the device can quickly construct the tree structure data, so that the data processing efficiency is improved.

Description

Tree structure data processing method and device, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a tree structure data processing method and apparatus, an electronic device, and a storage medium.
Background
In many practical applications, it is often necessary to process and manipulate tree structured data. For example, in the process of making an online profile catalog, a user needs to continuously add parallel nodes or child nodes under the root node in a brain-like form. However, in the process of pursuing the function enrichment, the existing brain-like graph tool has excessive interface elements, is inconsistent with the operation habit of the internal user, and has poor operation experience due to interface blocking caused by frequent data interaction in a scene of large data volume.
Furthermore, there may be special requirements for different application scenarios, such as: limiting the depth of the node tree (e.g., up to 3 levels), limiting the types of nodes, and rules limiting the child nodes that can be added under each node (e.g., if a node already has a content node, it cannot continue to add directory nodes, etc.). Therefore, the existing brain map structure data cannot meet special requirements in different application scenes.
Disclosure of Invention
In view of this, the embodiments of the present application provide a tree structure data processing method, apparatus, electronic device, and storage medium, so as to solve the problem in the prior art that in a scenario with a large data volume, frequent data interaction causes interface blocking, operation experience is poor, and brain map structure data cannot meet special requirements in different application scenarios.
In a first aspect of an embodiment of the present application, a tree structure data processing method is provided, including: acquiring a file stream according to an Excel file uploaded by a user through a client, creating an OPC package based on the file stream, and providing an interface for accessing and operating the file stream in the OPC package by the OPC package; creating a table reader object for data extraction, and extracting data in the OPC package by using the table reader object to obtain a shared character string table, style information and worksheet data; creating a data analyzer for analyzing the worksheet data, setting an event processor for the data analyzer, sending parameters of the event processor to an event-driven processor, and reading and analyzing the worksheet data row by utilizing the event-driven processor; in the process of reading and analyzing the worksheet data row by using an event-driven processor, analyzing the worksheet data according to the event-triggered event processor associated with the event, storing the analyzed data into the event processor, and generating a data list; invoking a preset recursion method, wherein the recursion method is used for inquiring the catalogue and the content data in the data list, traversing the catalogue and the content data to screen out all root nodes, and sequencing the root nodes to determine the root node sequence of the tree structure; traversing the root node by utilizing a preset outer layer circulation, traversing the total data in the data list by utilizing a preset inner layer circulation to obtain non-leaf nodes or leaf nodes, and repeatedly executing the inner layer circulation for each non-leaf node until the leaf node is found to obtain tree structure data; splitting the tree structure data into father-son node relations, sorting the catalogues and the content data in the father-son node relations according to the hierarchy, carrying out serialization processing on the sorted catalogues and the content data, and storing the processed catalogues and content data in a database.
In a second aspect of the embodiments of the present application, there is provided a tree structure data processing apparatus, including: the creation module is configured to acquire a file stream according to an Excel file uploaded by a user through a client, create an OPC package based on the file stream, and provide an interface for accessing and operating the file stream in the OPC package; the extraction module is configured to create a table reader object for data extraction, and extract data in the OPC package by using the table reader object to obtain a shared character string table, style information and worksheet data; the setting module is configured to create a data parser for parsing the worksheet data, set an event processor for the data parser, send parameters of the event processor to the event-driven processor, and read and parse the worksheet data row by utilizing the event-driven processor; the analysis module is configured to trigger an event processor associated with an event to analyze the worksheet data according to the event in the process of reading and analyzing the worksheet data row by using the event-driven processor, store the analyzed data into the event processor and generate a data list; the calling module is configured to call a preset recursion method, the recursion method is used for inquiring the catalogue and the content data in the data list, traversing the catalogue and the content data to screen out all root nodes, and sequencing the root nodes to determine the root node sequence of the tree structure; the traversing module is configured to traverse the root node by utilizing a preset outer layer circulation and traverse the whole data in the data list by utilizing a preset inner layer circulation to obtain non-leaf nodes or leaf nodes, and repeatedly executing the inner layer circulation for each non-leaf node until the leaf node is found to obtain tree structure data; the storage module is configured to split the tree structure data into father-son node relations, sort the catalogues and the content data in the father-son node relations according to the hierarchy, and store the sorted catalogues and the content data in the database after the serialization processing.
In a third aspect of the embodiments of the present application, there is provided an electronic device including a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the above method when executing the program.
In a fourth aspect of the embodiments of the present application, there is provided a computer readable storage medium storing a computer program which, when executed by a processor, implements the steps of the above method.
The above-mentioned at least one technical scheme that this application embodiment adopted can reach following beneficial effect:
an OPC package is created based on the file stream by acquiring the file stream according to an Excel file uploaded by a user through a client, and the OPC package provides an interface for accessing and operating the file stream in the OPC package; creating a table reader object for data extraction, and extracting data in the OPC package by using the table reader object to obtain a shared character string table, style information and worksheet data; creating a data analyzer for analyzing the worksheet data, setting an event processor for the data analyzer, sending parameters of the event processor to an event-driven processor, and reading and analyzing the worksheet data row by utilizing the event-driven processor; in the process of reading and analyzing the worksheet data row by using an event-driven processor, analyzing the worksheet data according to the event-triggered event processor associated with the event, storing the analyzed data into the event processor, and generating a data list; invoking a preset recursion method, wherein the recursion method is used for inquiring the catalogue and the content data in the data list, traversing the catalogue and the content data to screen out all root nodes, and sequencing the root nodes to determine the root node sequence of the tree structure; traversing the root node by utilizing a preset outer layer circulation, traversing the total data in the data list by utilizing a preset inner layer circulation to obtain non-leaf nodes or leaf nodes, and repeatedly executing the inner layer circulation for each non-leaf node until the leaf node is found to obtain tree structure data; splitting the tree structure data into father-son node relations, sorting the catalogues and the content data in the father-son node relations according to the hierarchy, carrying out serialization processing on the sorted catalogues and the content data, and storing the processed catalogues and content data in a database. The utility model provides a tree-shaped data structure of refining avoids the condition that the interface card is blocked when customer end and server side carry out data interaction to promote user operation experience, can satisfy the special demand under the different application scenes to tree-shaped structure data generation and processing's efficiency has been improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the following description will briefly introduce the drawings that are needed in the embodiments or the description of the prior art, it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flow chart of a tree structure data processing method according to an embodiment of the present application;
FIG. 2 is a flowchart of another tree structure data processing method according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of a tree structure data processing apparatus according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system configurations, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.
In actual terms, an online profile catalog is involved, which is a typical tree structure data, similar to a brain graph, allowing a user to continually add parallel nodes or children to a node under the root node. In order to pursue the function enrichment, the existing brain-like graphic tools on the market have more interface elements and do not accord with the operation habit of internal users. In addition, in a scene with a large data volume, frequent data interaction causes interface jamming.
In addition, the demander sets some special demands, such as: (1) the node tree has a depth limit (e.g., up to 3 levels); (2) nodes are of different types; (3) The child nodes which can be added under each node have a certain rule (for example, if a node already has a content node, a catalog node cannot be added continuously, etc.), so that the brain graph structure on the market is not applicable any more, and a new input tool needs to be created to realize the input of tree structure data and meet special requirements.
The input tool of tree structure data includes two parts: user operation interface and background storage system. The desiring party expects the user to operate simply and respond quickly, which requires the front-end interface to interact with the background storage system less frequently and to transmit small packets. In order to pursue the function enrichment, the existing brain-like graphic tools on the market have more interface elements and do not accord with the operation habit of internal users. In addition, in a scene with a large data volume, frequent data interaction causes interface jamming.
In view of the problems existing in the prior art, how to design an input tool which can realize the input of tree structure data and the control of customized rules, has high interaction efficiency with a background storage system and small data transmission package is an important problem in the development of the current technology.
Fig. 1 is a flow chart of a tree structure data processing method according to an embodiment of the present application. The tree structure data processing method of fig. 1 may be performed by a server. As shown in fig. 1, the tree structure data processing method specifically may include:
s101, acquiring a file stream according to an Excel file uploaded by a user through a client, creating an OPC package based on the file stream, and providing an interface for accessing and operating the file stream in the OPC package by the OPC package;
s102, creating a table reader object for data extraction, and extracting data in an OPC package by using the table reader object to obtain a shared character string table, style information and worksheet data;
s103, creating a data analyzer for analyzing the worksheet data, setting an event processor for the data analyzer, sending parameters of the event processor to an event-driven processor, and reading and analyzing the worksheet data row by utilizing the event-driven processor;
S104, in the process of reading and analyzing the worksheet data row by using the event-driven processor, the worksheet data is analyzed according to the event-triggered event processor associated with the event, the analyzed data is stored in the event processor, and a data list is generated;
s105, calling a preset recursion method, wherein the recursion method is used for inquiring the catalogue and the content data in the data list, traversing the catalogue and the content data to screen out all root nodes, and sequencing the root nodes to determine the root node sequence of the tree structure;
s106, traversing the root node by utilizing a preset outer layer circulation, traversing the total data in the data list by utilizing a preset inner layer circulation to obtain non-leaf nodes or leaf nodes, and repeatedly executing the inner layer circulation for each non-leaf node until the leaf node is found to obtain tree structure data;
s107, splitting the tree structure data into parent-child node relations, sorting the catalogues and the content data in the parent-child node relations according to the hierarchy, carrying out serialization processing on the sorted catalogues and the content data, and storing the processed catalogues and content data in a database.
Tree structure data is a non-linear data structure in which each data node may have multiple children, but only one parent node (except the root node, which has no parent node). This structure is similar to an inverted tree and is therefore referred to as a "tree" structure. In a tree data structure, the following main elements are included, but not limited to:
Node (Node): each data element is a node.
Root Node: the node without a parent node is the starting node of the tree.
Child Nodes (Child Nodes): a node is directly connected to a subordinate node.
Parent Node (Parent Node): an upper node to which one node is directly connected.
Leaf Nodes (Leaf Nodes): there are no child nodes.
Sibling Nodes (Sibling Nodes): nodes having a common parent node.
Tree data structures have wide-ranging applications in computer science and data processing, such as file systems, syntax analysis of programming languages, database indexing, and so forth. The tree data structure has advantages in terms of processing hierarchical relationships, organizing structures, classifying information, and the like.
In the embodiment of the application, the operations of creating the OPC package, creating the table reader object, analyzing the worksheet data and the like according to the file stream can be completed based on an Apache POI library (POI library for short) which is developed for the second time, and the Apache POI library is an open source Java library for processing Microsoft Office documents. It is mainly used in this application to read and analyze Excel files. In practical application, reading an Excel file using an Apache POI library mainly includes two key parts: key points of the POI library and SAX parsing method are used.
Further, the key points of using the POI library include setting the time pattern of the POI in order to ensure that the library processes date and time data correctly, and reading the Excel file. When an Excel file is read, a file stream is obtained through the Excel file, and then an OPCPackage object (i.e., OPC package) is created according to the file stream, and this object represents the entire Excel file.
In some embodiments, obtaining a file stream from an Excel file uploaded by a user via a client, creating an OPC package based on the file stream, includes: obtaining an Excel file uploaded by a user through a client, converting the Excel file into a file stream, and inputting the file stream as a parameter into a method of a preset OPCPackage class so that the OPCPackage class creates an OPC package; the Excel file contains a catalog, content and content type, and the OPC package is an OPCPackage object.
In particular, an OPCPackage object (i.e., an OPC package) may be viewed as an abstract representation of an OOXML file that provides a set of APIs to access and manipulate the various resources contained in this package. For example, opcpack may be used to read all worksheets in an Excel file, or to modify a paragraph in a Word file. In practical application, the basic process of creating an OPC package from a file stream is as follows:
First, an Excel file source (i.e., an Excel file uploaded by a client) is obtained, where the Excel file may be a file stored on a disk or a file downloaded from a network. This file needs to be converted into an input stream (i.e., excel file stream).
Next, the file stream obtained in the previous step is input as a parameter into the open method of the opcpack class using the open method of the opcpack class, so that an opcpack object (i.e., OPC package) can be created. This object represents the entire Excel file and can be understood to be an abstract representation of the Excel file. After the OPC package is obtained, it can be used to access and manipulate Excel files. Thus, the OPC package provides a series of interfaces through which data in an Excel file can be read or written.
Further, after creating the OPC package according to the file stream, the embodiments of the present application will also create an XSSFReader object (i.e. a table reader object), by which each part in the Excel file, such as the shared string table, style information, and worksheet data, can be accessed one by one. It provides a lower-level, more flexible way to handle large Excel files, especially when streaming reads are needed to save memory.
Note that the XSSFReader object is a class in the POI library, and is used to read an Excel file (more precisely, read an. Xlsx format file of Excel). This class provides a number of ways to read different parts of an Excel file, including shared string tables (Shared Strings Table), style Sheets (style tables), and data for individual worksheets (Sheets). By creating an XSSFReader object, the content of the Excel file is read from the OPCPackage object. In practice, an XSSFReader object may also be referred to as an "XSSF reader".
In the context of the embodiments of the present application, an XSSFReader object may be understood as a tool that is capable of extracting specific data from OPCPackage (i.e., an abstract representation of an Excel file), including, for example, shared string tables, style information, and worksheet data.
In one example, when the application reads an Excel file uploaded by a user by using a POI, a Maven dependence implementation is imported, and the following detailed description of a specific adding method of the Maven dependence is described in combination with a code, which specifically may include the following:
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi</artifactId>
<version>4.0.1</version>
</dependency>
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi-ooxml</artifactId>
<version>4.0.1</version>
</dependency>
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi-ooxml-schemas</artifactId>
<version>4.0.1</version>
</dependency>
this piece of code is the library dependencies required to be added to the Maven project, which will be used to process Excel files. Maven is a project management tool that can automatically download and manage the various libraries required for a project. These libraries appear to be part of the project, and the code of the project can use the functions in the libraries.
Here, three library dependencies are added:
the dependence of < groupId > org.apache.poi </groupId > < artifactId > poi </artifactId > < version >4.0.1</version > is an integral part of the apache poi library, which provides many functions for manipulating Office files (e.g., excel, word, etc.).
< groupId > org. POI </groupId > < artifactId > POI-OOXML </artifactId > < version >4.0.1</version > -this dependence is an extension of the Apache POI library that provides support for Office 2007 and later versions (these versions of the file format are commonly referred to as OOXML, namely Office Open XML).
< groupId > org. Poi </groupId > < artifactId > poi-OOXML-Schema </artifactId > < version >4.0.1</version > this dependency contains the various XML Schema definitions required for the OOXML file format. These schemas define the structure of various elements and attributes in the OOXML file.
After these dependencies are added, the code of the project can read and process Excel files using the various functions provided by the POI library.
After reading the Excel file, the embodiment of the application uses an SAX analysis method to analyze the read data, wherein the SAX analysis method comprises the following steps:
Custom Sheet processor: a processor is created for processing the data in the Excel table. This processor needs to be written according to specific requirements in order to perform specific operations on the data.
XmlReader to create Sax: this object is responsible for parsing the XML structure of the Excel file and passing the events in the parsing process to the Sheet processor.
Setting a Sheet event processor: associating the custom Sheet processor with the xmlroader ensures that the processor is able to receive and process the corresponding event during the parsing process.
Reading row by row: and finally, reading the data in the Excel table row by row through the XmlReader, and processing according to the customized Sheet processor.
The above embodiment describes how to read and parse data from an Excel file using the Apache POI library and SAX parsing method. By creating custom Sheet processors and associated events, programmers can perform desired operations on the data.
Alternatively, when parsing an Excel file using the SAX parsing method, an XMLR (X-Messaging and X-Messaging) object (i.e., a data parser) is first created, which is used to parse the XML file. Since the internal structure of an Excel file is in XML format in practice, XML read is required for parsing. XMLR is the core interface of Java SAX (Simple API for XML) parser. SAX parser is an event-driven parsing scheme that triggers a series of events such as document start, document end, tag start, tag end, text read, etc., when parsing an XML document. The developer may perform custom processing by registering the processor on these events.
In practical applications, a data parser is created and an event handler (i.e., a SeetHandler object) is set for the data parser. In the process of parsing the XML document, the SAX parser calls a corresponding processor method to process each time an event is encountered. In practice, XML reader may also be referred to as an "XML reader" or "XML parser".
In the context of the embodiments of the present application, XML read may be understood as a tool that is capable of reading XML formatted Excel worksheet data and triggering a preset event handler (i.e., a shaethandler object) for data processing when a specific event is encountered (e.g., a new line or new cell is started).
In some embodiments, after setting the event handler for the data parser, the method further comprises: registering an event processor in an XML processor object of a worksheet, setting the XML processor object of the worksheet as a content processor of a data parser, calling the XML processor object of the worksheet when the data parser is parsing an Excel file, traversing all worksheets in the Excel file, and reading data in each worksheet line by using the data parser.
Specifically, a SeetHandler object (i.e., an event handler, also called a form handler) is created and registered with an XSSFSeetXMLHand object (i.e., an XML handler object of a worksheet). The SeetHandler object is a user-defined event handler, and when the XMLR (data parser) encounters some events (such as starting to parse one cell, ending to parse one cell, etc.) when parsing an Excel file, the corresponding method in the SeetHandler object is actively called to process data.
In one example, XSSFSheetXMLHand (XML processor object of worksheets) is set as the content processor of XMLR (data parser) so that when XMLR parses an Excel file, it calls methods in XSSFSheetXMLHand and its internal SheetHandler to process.
Note that xssfshaetxmlhand is a class in the POI library that is used to process or parse a worksheet part in an Excel file (typically an. Xlsx file) in Office Open XML (OOXML) format. Thus, XSSFSvieetXMLHand may be understood as an XML processor for processing worksheet parts in an Excel file in OOXML format.
In some embodiments, parsing the worksheet data in accordance with the event trigger and event handler associated with the event includes:
when the worksheet data is read and analyzed line by line according to event triggering, an element tree data object is created, wherein the element tree data object is used for packaging each line of data in an Excel file;
when each cell in each row of data is analyzed, the data corresponding to the cell is stored in the corresponding attribute of the element tree data object according to the name of the cell;
after the analysis of one line of data is finished, adding the element tree data object currently packaged with one line of data into a list, checking the size of the list, executing one data insertion operation when the size of the list reaches a preset threshold value, and emptying the list;
the event includes a start of parsing a line of data, an end of parsing a line of data, a parse cell, and an end of parsing a cell.
Specifically, at the beginning of parsing each line of data of an Excel file, a new ElementTreeData object (i.e., an element tree data object, which is used to represent a line of data) is created, and the element tree data object is used to encapsulate the line of data. The element tree data object herein may be considered a data structure for storing and managing data related to a line of data of an Excel table. In practice, each ElementTreeData object represents a line of data in an Excel table.
Further, when each cell in each row of data is parsed, the data corresponding to the cell (i.e., the value corresponding to the cell) is set to the corresponding attribute of the current ElementTreeData object (element tree data object) according to the name of the cell (e.g., "a", "B", etc.).
After the parsing of a line of data is completed, the ElementTreeData object encapsulating the line of data is added to a list and the size of the list is checked. Whether the size of the list reaches a preset threshold (e.g., the threshold is set to 1024) is determined, and if the size of the list reaches 1024, a data batch insert operation is performed and then the list is emptied.
The above embodiment describes the overall process of parsing worksheet data in an Excel file using a data parser. The analysis processing method of the worksheet data in the Excel file is described below in conjunction with the actual code, and specifically may include the following:
packagecom.longfor.c2.classification.service;
importcom.longfor.c2.classification.api.domain.response.element.ElementTreeData;
import lombok.Data;
import org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler;
importorg.apache.poi.xssf.usermodel.XSSFComment;
importjava.util.ArrayList;
import java.util.List;
/**
* @Author: han tao
* Analysis processor based on Sax for customized sheet of @ Description
* @ Date: 11:00 am 2022/7/12
*/
@Data
public class SheetHandlerimplements XSSFSheetXMLHandler.SheetContentsHandler {
Encapsulated entity object
privateElementTreeData elementTreeData;
Set of/(and/or) entity objects
privateList<ElementTreeData>employeeList = new ArrayList<>(MAX_EMPLOYEE);
Maximum capacity of set
privatestatic final int MAX_EMPLOYEE = 1024;
Data is inserted by the number of times, and the initial value is 1
privateint times = 1;
Volume of data// total
privateint allCount = 0;
/**
* Starting when starting to resolve a line
*
* Line number @ param i
*/
@Override
publicvoid startRow(int i) {
if (i>0) {
elementTreeData = new ElementTreeData();
}
}
/**
* Starting when the analysis of a certain line is finished
*
* Line number @ param i
*/
@Override
publicvoid endRow(int i) {
employeeList.add(elementTreeData);
allCount++;
if (employeeList.size() == MAX_EMPLOYEE) {
It is assumed that there is one batch insert
System.out.printin ("perform" +times+ "insert");
times++;
employeeList.clear();
}
}
/**
* Processing each cell in a row
*
* Name of Param cellName cell
* Data @ param value
* Annotation param xssfComment
*/
@Override
publicvoid cell(String cellName, String value, XSSFComment xssfComment) {
if (elementTreeData != null) {
String prefix = cellName.substring(0, 1);
switch (prefix) {
case "A":
elementTreeData.setId(Long.valueOf(value));
break;
case "B":
elementTreeData.setName(value);
break;
case "C":
elementTreeData.setType(Integer.valueOf(value));
break;
case "D":
elementTreeData.setSort(Integer.valueOf(value));
break;
case "E":
elementTreeData.setElementUuid(value);
break;
case "F":
elementTreeData.setParentId(value);
break;
}
}
}
}
The code is used for analyzing an Excel file uploaded by a user, and storing the analyzed data in a memory in the form of an object. The following is a detailed explanation of this piece of code:
first, maven relies on the importation of items, maven being a project management tool used to manage the construction, reporting, documentation, etc. of projects. Here, three dependencies of the Apache POI library are imported, which are used to process Office documents, especially Excel documents.
The beginning part of the code then defines a class shaethandler that implements an interface xssfshaetxmlhand. Shaetcontentshandler in the POI library. This interface provides methods for handling each line of data in an Excel file that will be invoked when a particular event is encountered (e.g., start parsing a line, end parsing a line, parse a cell, etc.).
Further, specifically to the SeetHandler class, the operations performed by the event handler include:
at the beginning of parsing a new row of data, a new ElementTreeData object (element tree data object) is created. When the cell is analyzed, setting the value of the cell into the corresponding attribute of the current ElementTreeData object according to the name of the cell. After the parsing of a line of data is completed, the current ElementTreeData object is added to a list. And checks if the size of the list reaches a threshold (e.g., 1024), performs a batch insert operation if the threshold is reached, and clears the list.
The above code also provides a getemployee list () method to obtain data that has not been inserted at the code that invokes the parse processor. The total data amount and the number of insertions are recorded for statistics and log output. Thus, the main role of the SeetHandler class is to parse the content of the Excel worksheet and convert it into a list of objects that the application can use.
In the SeetHandler class, member variables are also defined, including an elementTreeData object (to encapsulate a row of data), a List < elementTreeData > object (to hold multiple rows of data), and variables to count. The SeetHandler class then rewrites three methods in the SeetContentsHandler interface: startRow, endRow and cells.
Wherein the startRow method is invoked when processing a line of data is started. In this method, a new ElementTreeData object is created, ready to encapsulate the data for this row.
The cell method is used to process each cell in a row. In the method, according to the name of the cell, the data of the cell is saved in the corresponding attribute of the ElementTreeData object.
The endRow method is invoked when processing a row of data is completed. In this method, an ElementTreeData object encapsulating a line of data is added to a list, and the size of the list is checked. If the size of the list reaches the threshold, a data insertion operation is performed and the list is then emptied.
It should be noted that the above code is only an example, and the flow of the analysis processing operation on the worksheet data in the Excel file by using the Sax-based analysis processor is implemented through the above code, and in practical application, the above code may be replaced by a corresponding database operation code according to a specific usage scenario, so the above code does not limit the technical solution of the present application.
In another example, the embodiment of the application also provides an implementation code of the complete process of Excel file reading and parsing the read data. The following describes the whole process of Excel file reading and data parsing in the tree structure data processing method according to the present application in detail with reference to specific codes and the content of the foregoing embodiments, and may specifically include the following:
packagecom.longfor.c2.classification.service;
importcom.longfor.c2.classification.api.domain.response.element.ElementTreeData;
import lombok.extern.slf4j.Slf4j;
importorg.apache.poi.openxml4j.opc.OPCPackage;
importorg.apache.poi.openxml4j.opc.PackageAccess;
importorg.apache.poi.xssf.eventusermodel.XSSFReader;
importorg.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler;
importorg.apache.poi.xssf.model.SharedStringsTable;
importorg.apache.poi.xssf.model.StylesTable;
importorg.springframework.stereotype.Service;
importorg.xml.sax.InputSource;
importorg.xml.sax.XMLReader;
importorg.xml.sax.helpers.XMLReaderFactory;
importjava.io.InputStream;
import java.util.List;
/**
* @Author: han tao
* Excel file service layer imported by @ Description
* There are two ways for excel reading
* First, user mode: operating excel using series of packaged api
* Event-driven: reading excel xml file based on sax reading mode
* @ Date: 11:13 am 2022/7/12
*/
@Slf4j
@Service
public classImportExcelService {
/**
* Reading large data volume excel
*
* @ param path file path
*/
publicvoid readBigDataExcel(String path) throws Exception {
Obtaining OPCpackage according to excel report
OPCPackage opcPackage = OPCPackage.open(path, PackageAccess.READ);
Creation of XSSFReader
XSSFReader xssfReader= new XSSFReader(opcPackage);
Obtaining ShareStringTable object
SharedStringsTable sharedStringsTable = xssfReader.getSharedStringsTable();
Obtaining styleTable object
StylesTable stylesTable = xssfReader.getStylesTable();
Creation of sax xmlReader object
XMLReader xmlReader = XMLReaderFactory.createXMLReader();
Registration event driven processor
SheetHandler sheetHandler = new SheetHandler();
XSSFSheetXMLHandler xssfSheetXMLHandler = new XSSFSheetXMLHandler(stylesTable,sharedStringsTable, sheetHandler, false);
xmlReader.setContentHandler(xssfSheetXMLHandler);
Line-by-line reading
XSSFReader.SheetIterator sheetIterator = (XSSFReader.SheetIterator)xssfReader.getSheetsData();
while (sheetIterator.hasNext()) {
InputStream in = sheetIterator.next();
InputSource is = new InputSource(in);
xmlReader.parse(is);
}
List < ElementTreeData > reployeyelist = shaethandler. Getemployeelist ()// remaining unexposed data insertion
log.info ("last inserted data, this data amount is { }", reployeist.size ());
info ("co-insert" +shaethandler.getallcount () + "bar data");
}
}
the main purpose of this code is to read the Excel file and parse the data therein. The method uses an SAX analysis mode (an analysis mode based on event driving), and the processes of Excel file reading and data analysis processing are as follows:
OPCPackage opcPackage = opcpack. Open (path, packageaccess. Read), this line code indicates that an Excel file is opened and an opcpack object is returned, which represents the entire Excel file.
XSSFReader xssfReader = new XSSFReader (opcPackage), this line of code means that an XSSFReader object is created, which is used to read various information from the opcPackage object.
ShareStringsTablesSeringsTable=xssfReader.getShareStringTable (), and StylesTable stylesTable =xssfReader.getStylesTable (), which represent the acquisition of ShareStringTable and StylesTable objects, respectively, which contain all the shared strings and style information in the Excel file, respectively.
XMLReader xmlReader = XML lreaderfactor. Createxml reader (); this row of code represents the creation of an XML reader object that is used to parse the XML file. Since the internal structure of an Excel file is in XML format in practice, XML read needs to be used for parsing.
SheetHandler sheetHandler = new shaethandler (), and XSSFSheetXMLHandler xssfSheetXMLHandler = newxssfshaetxmlhand (sharedstrangtable, shaethandler, false), which represent creating a shaethandler object and registering it in xssfshaetxmlhand. The SeetHandler object is a user-defined event handler that invokes a corresponding method in the SeetHandler when XMLR encounters certain events (e.g., start parsing a cell, end parsing a cell, etc.) when parsing an Excel file.
The line code represents a content handler that sets XSSFSheetXMLHand as XMLR, such that when XMLR is parsing an Excel file, the method in XSSFSheetXMLH and its internal SheetHandler is called.
A piece of code of while (shaetitator. Hasnext ()) {.. } represents traversing all worksheets in the Excel file, and reading the data in each worksheet row by row using xml read.
List < ElementTreeData > reployeyelist = useethandler, getemployee elist (); this row code indicates that the List of data stored in the useethandler is retrieved. Each element in this list represents a line of data in an Excel file.
The last two lines of code represent that some statistics are printed, such as the total amount of data inserted, etc.
Generally, the method provided by the code is to open an Excel file and create a custom event handler, then parse the data in the Excel file row by row using an event driven manner, and store the data in the event handler.
All the above embodiments describe the whole process of Excel file reading and data parsing, and the process and principle of constructing tree structure data by using the recursive method will be described in detail below with reference to the result of data parsing in the foregoing embodiments.
In the process of encapsulating data into a tree structure using a recursive algorithm. Recursion solves the problem by invoking the function itself. In the embodiment of the present application, it is necessary to construct data in a tree structure, where root nodes and non-leaf nodes represent directories and leaf nodes represent contents. The following is the basic principle of constructing tree structure data using a recursive algorithm in the embodiments of the present application:
1. when a recursive method is called, the computer allocates a new independent space (called stack space) for the method for storing local variables and other data in the method.
2. The local variables in the recursive method are independent, and each time the method is called, a new space is created, wherein the local variables do not influence the local variables in other calls.
3. If the recursive approach uses variables of the reference type (e.g., groups, objects), then the data of these reference types may be shared between different calls. This means that modifications to the reference type data by one method call may affect other calls.
4. The recursive method must approach the termination condition to prevent infinite recursion. In other words, in designing the recursive method, it is necessary to ensure that there is a condition under which the recursion can be stopped.
5. When the recursive method is executed or encounters a return statement, it returns to the place where it was called (popped stack), passing the result to the caller. When the method returns, its data in stack space is also cleared, freeing memory.
It follows that in building a tree structure using a recursive method, the recursive method allocates separate space for each call, but shares reference type data. To prevent infinite recursion, appropriate termination conditions need to be set. When the recursive method completes a task or encounters a return statement, it returns to where it was invoked and passes the result to the caller.
Based on the above basic principle of constructing tree structure data using a recursive algorithm, the basic flow steps of constructing data into a tree structure using a recursive method according to the embodiment of the present application include:
step one: the directory and the content data stored in the data table are all searched.
Step two: and starting traversing the data in the first step, and screening out all root nodes (namely nodes with the parentId field value of 0).
Step three: the data in the second step is ordered according to the sort field (here in ascending order according to the sort field value) so that the root node order of the tree is fixed.
Step four: and using a double-layer loop, wherein the first-layer loop (i.e. the outer-layer loop) starts traversing the root node data in the third step, the second-layer loop (i.e. the inner-layer loop) traverses the full data, and the data of which the pantId is equal to the root node primary key value, namely the non-leaf node or leaf node data, is screened out.
Step five: the inner loop is repeatedly executed until the data becomes a leaf node (i.e., the child data is empty) and the recursion is stopped, so that the tree structured data is constructed.
In some embodiments, traversing the root node with a predetermined outer loop and traversing the full volume of data in the data list with a predetermined inner loop results in a non-leaf node or leaf node, comprising: traversing the root node according to a preset outer layer cycle, traversing the full data according to a preset inner layer cycle so as to screen out nodes with field values corresponding to parent node IDs equal to the main key value of the currently traversed root node, and taking the screened nodes as non-leaf nodes or leaf nodes.
Specifically, after all the data are acquired from the data table, the data include directory and content data. Traversing the data and screening out all root nodes. Root nodes refer to those nodes that are at the top level in the tree structure, typically marked with a certain field (e.g., the "pantid" field) as 0 or a certain value. The root nodes are ordered according to a certain field (e.g., the "sort" field) so that the order of the root nodes of the tree can be determined. This operation is to have a certain order of the tree structure.
Further, two-level loops (i.e., an outer-level loop and an inner-level loop) are used to construct the other portions of the tree structure. Traversing the root node by using the outer loop, traversing all data by using the inner loop, finding out nodes with field values of 'parentId' (i.e. field values corresponding to parent node IDs) equal to the main key value of the current root node, and taking the nodes as non-leaf nodes or leaf nodes.
In some embodiments, for each non-leaf node, the inner loop is repeatedly performed until a leaf node is found, resulting in tree structure data, including: and repeatedly executing the inner layer circulation for each non-leaf node, taking the current node as the leaf node corresponding to the non-leaf node when the child node list is judged to be empty, stopping recursion, and constructing tree structure data according to the root node, the root node sequence, the non-leaf node and the leaf node.
Specifically, for each non-leaf node, the inner loop is repeatedly performed to find its child node. When no child node is found (i.e., child node list "child" is empty), the node is declared a leaf node, and recursion is stopped. And finally, constructing the data of the tree structure by using the root node, the root node sequence, the non-leaf nodes and the leaf nodes.
According to the technical scheme provided by the embodiment of the application, the tiled data is constructed into a tree structure in a top-to-bottom and outside-to-inside mode. Each node contains a link to its child node, which in turn may have its own child node, recursively down to the leaf node.
Further, after the tree structure data is constructed, the tree structure data is split into parent-child node relations, the catalogues and the content data in the parent-child node relations are ordered according to the hierarchy, and the ordered catalogues and the ordered content data are stored in the database after being subjected to serialization processing. For example: the server splits the data into a primary directory, a secondary directory … content node and the like according to the hierarchy of the data in Excel, then sorts the directories and the content according to the hierarchy, and finally, can use FastJson to sequence the sorted directories and content data and persist the sequenced data into a Mysql database (namely, store the data in the database).
In one example, the present application places a basic format check on the client, does not process the data format, and the server parses the data format and enters the library. The embodiment of the application adopts the following concise data format:
Column name Column description
element_uuid Element unique id
parent_id Parent id
type Node type 1 directory, 2 content
name Node name, type=1 is not null
level Node level 1,2,3
sort Ordering of
In some embodiments, after serializing the ordered catalog and content data and storing the serialized catalog and content data in a database, the method further comprises: receiving a directory structure query request initiated by a user through a client, responding to the directory structure query request, packaging the serialized directory and content data into tree structure data, and returning the tree structure data to the client so that the user can edit the queried tree structure data on line to adjust the directory structure.
Specifically, the embodiment of the application supports that a user initiates a directory structure query request through a client, and after receiving the directory structure query request, a server encapsulates the serialized directory and content data into tree structure data by using a recursion algorithm, namely, disassembles the parent-child node relationship into tree structure data, and returns the tree structure data to the client. And the user performs online operations such as adding, editing and the like on the queried data through the client, so that the whole directory tree structure is perfected.
Optionally, the embodiments of the present application make the following restrictions on directory contents: for each directory or content, a current node, parent node, is set. Nodes other than the root node are created to determine whether the upper level node is a directory or content. If it is a catalog, a catalog or content may be created; if it is content, creation of the content or catalog is no longer allowed.
The following describes in detail the process and principle of constructing data into a tree structure by using a recursive method in the tree structure data processing method according to the present application with reference to specific codes, and may specifically include the following:
1. /**
2. * @author hantao01
3. * @version 1.0
4. * @date 2022/7/1215:18
5. */
6. @Data
7. ApiModel (value= "component tree return entity")
8. public classElementTreeData {
9.
10. @ApiModelProperty(value = "id")
11. private Long id;
12.
13. apiModelProperty (value= "component unique id")
14. private String elementUuid;
15.
16. apiModelProperty (value= "1: directory, 2: content")
17. private Integer type;
18.
19. apiModelProperty (value= "directory name, type=1 is not empty")
20. private String name;
21.
22. apiModelProperty (value= "directory parent id")
23. private String parentId;
24.
25. apiModelProperty (value= "whether deleted, directory Tree used")
26. private Boolean deleteFlag;
27.
28. apiModelProperty (value= "whether selected, directory Tree used")
29. private Boolean selectFlag;
30.
31. apiModelProperty (value= "whether custom form, default to false")
32. private Boolean isDefine;
33.
34. apiModelProperty (value= "ordering of components in template, directory Tree usage")
35. private Integer sort;
36.
37. private List<ElementTreeData>children = new ArrayList<>();
38. }
39.
40.
41. /**
42. Acquiring List of tree structure
43. *
44. All elements looked up by @ param allList
45. * @return
46. */
47. public List<ElementTreeData>getTree(List<ElementTreeData>allList) {
48. List<ElementTreeData>rootList = new ArrayList<>();
49. // find all parent nodes
50. for (ElementTreeData data : allList) {
51. if (BusinessConstants.BUSINESS_STRING_NUMBER_ZERO.equals(data.getParentId())){
52. rootList.add(data);
53. Sequencing parent nodes
54. rootList.sort(Comparator.comparingDouble(ElementTreeData::getId));
55. }
56. }
57. recursionAsTree(allList, rootList);
58. return rootList;
59. }
60.
61. /**
62. Recursively transform List into tree structure
63. *
64. * @param allList
65. * @param rootList
66. */
67. public void recursionAsTree(List<ElementTreeData>allList, List<ElementTreeData>rootList) {
68. for (ElementTreeData data : rootList) {
69. for (ElementTreeData body : allList) {
70. if (data.getElementUuid().equals(body.getParentId())) {
71. data.getChildren().add(body);
72. }
73. }
74. if (CollectionUtils.isNotEmpty(data.getChildren())) {
75. data.getChildren().sort(Comparator.comparingDouble(ElementTreeData::getId));
76. }
77. recursionAsTree(allList, data.getChildren());
78. }
79. }
In the above code, a class named ElementTreeData is first defined, which represents the data of the component tree structure. Such contains a plurality of attributes, such as id, elementUuid, type, etc., and a list named child for storing child nodes. The contents of the above codes are explained as follows:
Code lines 6-35 define the properties of the ElementTreeData class, which use @ apiModelProperty notes to add corresponding descriptive information. For example, the description of the attribute type is "1: directory, 2: content".
Code line 37 defines a list named child that stores all children of the current node.
The code lines 47-59 define a method named getTree which converts the incoming all-element list allList into a tree structure. First, the method traverses all lists, finds all root nodes (nodes whose parent node ID is 0), and adds them to a list named rootList. Then, the rootList is ordered according to the ID of the node. And finally, calling a recurtionAutree method to construct the tree structure.
Code lines 67-79 define a method called recurreristree which uses a recursive algorithm to transform the list into a tree structure. This method receives two parameters: one is an all-element list, and the other is a root node list, rootList. The method first traverses each node in the rootList, then traverses the allllist again, and adds an element in the allllist to the child node list child of the rootList node if the parent node ID of the element is equal to the elementUuid of the node in the rootList. Then, if the child list is not empty, the child list is ordered according to the IDs of the nodes. Finally, the recurrence call recurrence method processes the child list.
According to the technical scheme provided by the embodiment of the application, the OPC package, the table reader object, the data parser and the event processor are used for processing the Excel file stream, so that the worksheet data can be effectively extracted and parsed, frequent data interaction is avoided, and the data processing efficiency is improved. In addition, through a preset recursion method and outer layer and inner layer circulation, tree structure data can be quickly constructed, and the data processing efficiency is improved. The method and the device can improve the storage efficiency of the data and the response speed of the data query and improve the operation experience of the user through splitting and sequencing the parent-child node relations of the tree structure data and storing the data after the serialization processing. The method can be adjusted according to the special requirements of the user, meets the special requirements in different application scenes, and provides customized services.
Fig. 2 is a flow chart of another tree structure data processing method according to an embodiment of the present application. The interaction subject referred to in fig. 2 is a client, a server and a database. As shown in fig. 2, the tree structure data processing method includes:
s201, the client sends an Excel file to the server;
S202, a server creates an OPC package based on a file stream;
s203, the server extracts data in the OPC package;
s204, the server creates a data analyzer and sets an event processor for the data analyzer;
s205, the server analyzes the worksheet data and generates a data list;
s206, the server side calls a recursion method, queries the catalogue and the content data, screens out root nodes, and determines the root node sequence of the tree structure;
s207, the server traverses the root node by using the outer layer circulation, traverses the full data by using the inner layer circulation, and obtains tree structure data;
s208, the server splits the tree structure data into father-son node relations, and sorts the catalogue and the content data in the father-son node relations according to the hierarchy;
s209, the database acquires the data subjected to serialization processing and performs persistent storage;
s210, the client sends a directory structure query request to the server;
s211, the server returns the tree structure data to the client.
According to the tree structure data processing method, the refined and efficient data structure and interaction mode can be achieved, customized rule control is met, and operation experience of a user is improved. Specifically, the technical scheme of the application has the following remarkable technical effects:
Through processing an Excel file stream, by creating an OPC package, a table reader object for data extraction, a data parser and an event processor, the progressive reading and parsing of the worksheet data are realized, and the data processing efficiency is greatly improved. In addition, tree structure data can be quickly constructed through a preset recursion method and outer layer and inner layer circulation, so that data processing efficiency is improved.
The data storage efficiency and the data query response speed are improved through splitting and sequencing the parent-child node relations of the tree structure data and storing the data after the serialization processing. When a user initiates a directory structure query request, the queried tree structure data can be quickly returned for the user to edit on line, so that the efficiency of data query and operation is improved.
According to the special requirements of users, such as the depth limit of the node tree, the type of the node, the rules of the sub-nodes which can be added under each node and the like, the method can be correspondingly adjusted so as to meet the special requirements under different application scenes.
The following are device embodiments of the present application, which may be used to perform method embodiments of the present application. For details not disclosed in the device embodiments of the present application, please refer to the method embodiments of the present application.
Fig. 3 is a schematic structural diagram of a tree structure data processing apparatus according to an embodiment of the present application. As shown in fig. 3, the tree structure data processing apparatus includes:
the creation module 301 is configured to obtain a file stream according to an Excel file uploaded by a user through a client, create an OPC package based on the file stream, and provide an interface for accessing and operating the file stream in the OPC package;
the extraction module 302 is configured to create a table reader object for data extraction, and extract data in the OPC package by using the table reader object to obtain a shared string table, style information and worksheet data;
a setting module 303 configured to create a data parser for parsing the worksheet data, set an event handler for the data parser, send parameters of the event handler to the event driven handler, and read and parse the worksheet data row by using the event driven handler;
the parsing module 304 is configured to parse the worksheet data according to the event-triggered event handler associated with the event in the process of reading and parsing the worksheet data row by using the event-driven handler, store the parsed data into the event handler, and generate a data list;
The calling module 305 is configured to call a predetermined recursive method, where the recursive method is used to query the directory and the content data in the data list, traverse the directory and the content data to screen out all the root nodes, and order the root nodes to determine the root node sequence of the tree structure;
a traversing module 306 configured to traverse the root node with a predetermined outer loop and traverse the full data in the data list with a predetermined inner loop to obtain non-leaf nodes or leaf nodes, and for each non-leaf node, repeatedly performing the inner loop until a leaf node is found, to obtain tree structure data;
the storage module 307 is configured to split the tree structure data into parent-child node relationships, sort the directories and content data in the parent-child node relationships according to a hierarchy, and store the sorted directories and content data in the database after the serialization processing.
In some embodiments, the creating module 301 of fig. 3 obtains an Excel file uploaded by a user through a client, converts the Excel file into a file stream, and inputs the file stream as a parameter into a method of a preset opcpack class, so that the opcpack class creates an OPC package; the Excel file contains a catalog, content and content type, and the OPC package is an OPCPackage object.
In some embodiments, the setup module 303 of FIG. 3 registers an event handler in the XML processor object of the worksheet after setting up the event handler for the data parser, and sets the XML processor object of the worksheet as the content handler for the data parser, when the data parser is parsing an Excel file, invokes the XML processor object of the worksheet, traverses all worksheets in the Excel file, and reads data in each worksheet row by row using the data parser.
In some embodiments, the parsing module 304 of fig. 3 creates an element tree data object when reading and parsing the worksheet data row by row upon event triggering, wherein the element tree data object is used to encapsulate each row of data in an Excel file; when each cell in each row of data is analyzed, the data corresponding to the cell is stored in the corresponding attribute of the element tree data object according to the name of the cell; after the analysis of one line of data is finished, adding the element tree data object currently packaged with one line of data into a list, checking the size of the list, executing one data insertion operation when the size of the list reaches a preset threshold value, and emptying the list; the event includes a start of parsing a line of data, an end of parsing a line of data, a parse cell, and an end of parsing a cell.
In some embodiments, the traversing module 306 of fig. 3 traverses the root node according to a predetermined outer loop, traverses the full data according to a predetermined inner loop, so as to screen out nodes whose field values corresponding to the parent node ID are equal to the main key value of the root node currently traversed, and uses the screened out nodes as non-leaf nodes or leaf nodes.
In some embodiments, the traversal module 306 of fig. 3 repeatedly executes the inner-layer loop for each non-leaf node, and when the child node list is determined to be empty, the current node is taken as the leaf node corresponding to the non-leaf node, the recursion is stopped, and tree structure data is constructed according to the root node, the root node sequence, the non-leaf node, and the leaf node.
In some embodiments, the encapsulation module 308 of fig. 3 receives a directory structure query request initiated by a user through a client after serializing the ordered directory and content data into a database, encapsulates the serialized directory and content data into tree structure data in response to the directory structure query request, and returns the tree structure data to the client to enable the user to edit the queried tree structure data online to adjust the directory structure.
It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic of each process, and should not limit the implementation process of the embodiment of the present application in any way.
Fig. 4 is a schematic structural diagram of the electronic device 4 provided in the embodiment of the present application. As shown in fig. 4, the electronic apparatus 4 of this embodiment includes: a processor 401, a memory 402 and a computer program 403 stored in the memory 402 and executable on the processor 401. The steps of the various method embodiments described above are implemented by processor 401 when executing computer program 403. Alternatively, the processor 401, when executing the computer program 403, performs the functions of the modules/units in the above-described apparatus embodiments.
Illustratively, the computer program 403 may be partitioned into one or more modules/units, which are stored in the memory 402 and executed by the processor 401 to complete the present application. One or more of the modules/units may be a series of computer program instruction segments capable of performing a specific function for describing the execution of the computer program 403 in the electronic device 4.
The electronic device 4 may be a desktop computer, a notebook computer, a palm computer, a cloud server, or the like. The electronic device 4 may include, but is not limited to, a processor 401 and a memory 402. It will be appreciated by those skilled in the art that fig. 4 is merely an example of the electronic device 4 and is not meant to be limiting of the electronic device 4, and may include more or fewer components than shown, or may combine certain components, or different components, e.g., the electronic device may also include an input-output device, a network access device, a bus, etc.
The processor 401 may be a central processing unit (Central Processing Unit, CPU) or other general purpose processor, digital signal processor (Digital Signal Processor, DSP), application specific integrated circuit (Application SpecificIntegrated Circuit, ASIC), field programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 402 may be an internal storage unit of the electronic device 4, for example, a hard disk or a memory of the electronic device 4. The memory 402 may also be an external storage device of the electronic device 4, for example, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash Card (Flash Card) or the like, which are provided on the electronic device 4. Further, the memory 402 may also include both internal storage units and external storage devices of the electronic device 4. The memory 402 is used to store computer programs and other programs and data required by the electronic device. The memory 402 may also be used to temporarily store data that has been output or is to be output.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working process of the units and modules in the above system may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the embodiments provided in this application, it should be understood that the disclosed apparatus/computer device and method may be implemented in other ways. For example, the apparatus/computer device embodiments described above are merely illustrative, e.g., the division of modules or elements is merely a logical functional division, and there may be additional divisions of actual implementations, multiple elements or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via interfaces, devices or units, which may be in electrical, mechanical or other forms.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated modules/units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present application implements all or part of the flow in the methods of the above embodiments, or may be implemented by a computer program to instruct related hardware, and the computer program may be stored in a computer readable storage medium, where the computer program may implement the steps of the respective method embodiments described above when executed by a processor. The computer program may comprise computer program code, which may be in source code form, object code form, executable file or in some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the content of the computer readable medium can be appropriately increased or decreased according to the requirements of the jurisdiction's jurisdiction and the patent practice, for example, in some jurisdictions, the computer readable medium does not include electrical carrier signals and telecommunication signals according to the jurisdiction and the patent practice.
The above embodiments are only for illustrating the technical solution of the present application, and are not limiting thereof; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.

Claims (10)

1. A tree structured data processing method, comprising:
acquiring a file stream according to an Excel file uploaded by a user through a client, and creating an OPC package based on the file stream, wherein the OPC package provides an interface for accessing and operating the file stream in the OPC package;
creating a table reader object for data extraction, and extracting data in the OPC package by using the table reader object to obtain a shared character string table, style information and worksheet data;
creating a data analyzer for analyzing the worksheet data, setting an event processor for the data analyzer, sending parameters of the event processor to an event-driven processor, and reading and analyzing the worksheet data row by utilizing the event-driven processor;
In the process of reading and analyzing the worksheet data row by utilizing the event-driven processor, the worksheet data is analyzed according to an event trigger and the event processor associated with the event, the analyzed data is stored in the event processor, and a data list is generated;
invoking a predetermined recursion method, wherein the recursion method is used for inquiring the catalogue and the content data in the data list, traversing the catalogue and the content data to screen out all root nodes, and sequencing the root nodes to determine the root node sequence of a tree structure;
traversing the root node by utilizing a preset outer layer circulation, traversing the total data in the data list by utilizing a preset inner layer circulation to obtain non-leaf nodes or leaf nodes, and repeatedly executing the inner layer circulation for each non-leaf node until the leaf node is found to obtain tree structure data;
splitting the tree structure data into parent-child node relations, sorting the catalogues and the content data in the parent-child node relations according to the hierarchy, carrying out serialization processing on the sorted catalogues and the content data, and storing the processed catalogues and content data in a database.
2. The method of claim 1, wherein the obtaining a file stream from an Excel file uploaded by a user via a client, creating an OPC package based on the file stream, comprises:
acquiring an Excel file uploaded by a user through a client, converting the Excel file into a file stream, and inputting the file stream as a parameter into a method of a preset OPCPackage class so that the OPCPackage class creates an OPC package;
the Excel file contains a catalog, content and content types, and the OPC package is an OPCPackage object.
3. The method of claim 1, wherein after setting up an event handler for the data parser, the method further comprises:
registering the event processor into an XML processor object of a worksheet, setting the XML processor object of the worksheet as a content processor of the data parser, calling the XML processor object of the worksheet when the data parser parses the Excel file, traversing all worksheets in the Excel file, and reading data in each worksheet line by utilizing the data parser.
4. A method according to claim 3, wherein said triggering on an event said event handler associated with said event to parse said worksheet data comprises:
When the worksheet data is read and analyzed line by line according to the event trigger, an element tree data object is created, wherein the element tree data object is used for packaging each line of data in an Excel file;
when each cell in each row of data is analyzed, storing the data corresponding to the cell into the corresponding attribute of the element tree data object according to the name of the cell;
after the analysis of one line of data is finished, adding an element tree data object currently packaging one line of data into a list, checking the size of the list, executing one data insertion operation when the size of the list reaches a preset threshold value, and emptying the list;
the event includes a start of analyzing a line of data, an end of analyzing a line of data, an analysis cell, and an end of analyzing a cell.
5. The method of claim 1, wherein traversing the root node with a predetermined outer loop and traversing the full volume of data in the data list with a predetermined inner loop results in a non-leaf node or a leaf node, comprising:
traversing the root node according to the preset outer layer circulation, traversing the full data according to the preset inner layer circulation so as to screen out nodes with field values corresponding to father node IDs equal to the main key value of the root node traversed currently, and taking the screened out nodes as the non-leaf nodes or the leaf nodes.
6. The method of claim 5, wherein for each of the non-leaf nodes, repeatedly performing an inner loop until a leaf node is found, to obtain tree structure data, comprising:
and repeatedly executing the inner layer circulation for each non-leaf node, taking the current node as the leaf node corresponding to the non-leaf node when the child node list is judged to be empty, stopping recursion, and constructing the tree structure data according to the root node, the root node sequence, the non-leaf node and the leaf node.
7. The method of claim 1, wherein after said serializing said ordered catalogue and content data and storing said serialized catalogue and content data in a database, said method further comprises:
receiving a directory structure query request initiated by the user through a client, responding to the directory structure query request, packaging the serialized directory and content data into tree structure data, and returning the tree structure data to the client so that the user can edit the queried tree structure data on line to adjust the directory structure.
8. A tree structured data processing apparatus, comprising:
The creation module is configured to acquire a file stream according to an Excel file uploaded by a user through a client, create an OPC package based on the file stream, and provide an interface for accessing and operating the file stream in the OPC package;
the extraction module is configured to create a table reader object for data extraction, and extract the data in the OPC package by using the table reader object to obtain a shared character string table, style information and worksheet data;
the setting module is configured to create a data parser for parsing the worksheet data, set an event processor for the data parser, send parameters of the event processor to an event driven processor, and read and parse the worksheet data row by utilizing the event driven processor;
the analysis module is configured to analyze the worksheet data according to an event trigger and the event processor associated with the event in the process of reading and analyzing the worksheet data row by utilizing the event-driven processor, store the analyzed data into the event processor and generate a data list;
The calling module is configured to call a preset recursion method, the recursion method is used for inquiring the catalogue and the content data in the data list, traversing the catalogue and the content data to screen out all root nodes, and sequencing the root nodes to determine the root node sequence of the tree structure;
the traversing module is configured to traverse the root node by utilizing a preset outer layer circulation and traverse the whole data in the data list by utilizing a preset inner layer circulation to obtain non-leaf nodes or leaf nodes, and repeatedly executing the inner layer circulation for each non-leaf node until the leaf node is found to obtain tree structure data;
the storage module is configured to split the tree structure data into parent-child node relations, sort the catalogues and the content data in the parent-child node relations according to the hierarchy, and store the sorted catalogues and content data in a database after the serialization processing.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method of any one of claims 1 to 7 when the program is executed.
10. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the method according to any one of claims 1 to 7.
CN202310699380.1A 2023-06-14 2023-06-14 Tree structure data processing method and device, electronic equipment and storage medium Pending CN116450655A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310699380.1A CN116450655A (en) 2023-06-14 2023-06-14 Tree structure data processing method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310699380.1A CN116450655A (en) 2023-06-14 2023-06-14 Tree structure data processing method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116450655A true CN116450655A (en) 2023-07-18

Family

ID=87127672

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310699380.1A Pending CN116450655A (en) 2023-06-14 2023-06-14 Tree structure data processing method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116450655A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117038002A (en) * 2023-10-08 2023-11-10 之江实验室 Method and device for generating observation variable in drug evaluation research

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109460410A (en) * 2018-11-08 2019-03-12 四川长虹电器股份有限公司 By the json data conversion with set membership at the method for tree structure data
CN114328450A (en) * 2021-12-14 2022-04-12 上海万物新生环保科技集团有限公司 Method and equipment for generating tree structure

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109460410A (en) * 2018-11-08 2019-03-12 四川长虹电器股份有限公司 By the json data conversion with set membership at the method for tree structure data
CN114328450A (en) * 2021-12-14 2022-04-12 上海万物新生环保科技集团有限公司 Method and equipment for generating tree structure

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
一个只会喊六六六的程序猿: "poi事件驱动读取大数据量excel(50W+数据)工具", HTTPS://BLOG.CSDN.NET/QQ_35891949/ARTICLE/DETAILS/107400411, pages 1 - 4 *
蜀山雪松: "Java递归List返回树状结构", HTTPS://BLOG.CSDN.NET/JIANXIA801/ARTICLE/DETAILS/82771346, pages 1 - 4 *
随心所鱼: "List结构数据组装成树结构实现方式", HTTPS://BLOG.CSDN.NET/WEIXIN_41849263/ARTICLE/DETAILS/106625203, pages 1 - 5 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117038002A (en) * 2023-10-08 2023-11-10 之江实验室 Method and device for generating observation variable in drug evaluation research
CN117038002B (en) * 2023-10-08 2024-02-13 之江实验室 Method and device for generating observation variable in drug evaluation research

Similar Documents

Publication Publication Date Title
US7596550B2 (en) System and method for query planning and execution
US9043757B2 (en) Identifying differences between source codes of different versions of a software when each source code is organized using incorporated files
CN110007920B (en) Method and device for acquiring code dependency relationship and electronic equipment
US20050108628A1 (en) System and method for generating optimized binary representation of an object tree
CN116450655A (en) Tree structure data processing method and device, electronic equipment and storage medium
CN112416787A (en) JAVA-based project source code scanning analysis method, system and storage medium
CN114595201A (en) Method, equipment and storage medium for inquiring acquisition record of interface access log
CN116483850A (en) Data processing method, device, equipment and medium
CN114168149A (en) Data conversion method and device
US7143101B2 (en) Method and apparatus for self-describing externally defined data structures
CN113282579A (en) Heterogeneous data storage and retrieval method, device, equipment and storage medium
KR100762712B1 (en) Method for transforming of electronic document based on mapping rule and system thereof
US20090055345A1 (en) UDDI Based Classification System
CN115357286B (en) Program file comparison method and device, electronic equipment and storage medium
CN111126008A (en) XSD-based code generation method and device, computer equipment and storage medium
CN111104122A (en) Method for mapping xml service logic to java service logic
CN114385145A (en) Web system back-end architecture design method and computer equipment
CN114866627A (en) Message checking method, device, processor and electronic equipment
CN113448965A (en) Method, device and equipment for determining full-table-scanning structured query statement
CN110990000A (en) Data request processing method, device and equipment for MVC (model view controller) pattern design model layer
CN113703739A (en) Cross-language fusion computing method, system and terminal based on omiga engine
Mezei et al. The dynamic sensor data description and data format conversion language.
CN117435177B (en) Application program interface construction method, system, equipment and storage medium
CN109992293A (en) The assemble method and device of android system complement version information
CN111930349A (en) Program package generation method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination