CN116521845B - Method for reading complex electronic form file and electronic equipment - Google Patents

Method for reading complex electronic form file and electronic equipment Download PDF

Info

Publication number
CN116521845B
CN116521845B CN202310495684.6A CN202310495684A CN116521845B CN 116521845 B CN116521845 B CN 116521845B CN 202310495684 A CN202310495684 A CN 202310495684A CN 116521845 B CN116521845 B CN 116521845B
Authority
CN
China
Prior art keywords
data
model
read
dynamic
electronic form
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310495684.6A
Other languages
Chinese (zh)
Other versions
CN116521845A (en
Inventor
黄飞
文蓉
徐兴
王立鑫
王家梅
龚玄
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan Wisesoft System Integration Co ltd
Original Assignee
Sichuan Wisesoft System Integration Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan Wisesoft System Integration Co ltd filed Critical Sichuan Wisesoft System Integration Co ltd
Priority to CN202310495684.6A priority Critical patent/CN116521845B/en
Publication of CN116521845A publication Critical patent/CN116521845A/en
Application granted granted Critical
Publication of CN116521845B publication Critical patent/CN116521845B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/177Editing, e.g. inserting or deleting of tables; using ruled lines
    • G06F40/18Editing, e.g. inserting or deleting of tables; using ruled lines of spreadsheets
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Abstract

The invention relates to the technical field of computers, and discloses a method for reading a complex electronic form file, which specifically comprises the following steps: s1, constructing a Sheet model according to a template electronic form corresponding to the electronic form to be read, wherein the Sheet model consists of a plurality of Element models, and the number of the Element models is consistent with the number of cells in the template electronic form; s2, traversing an Element model in the Sheet model, and respectively reading data objects of the electronic table in different modes by a reader according to the type of the Element model, wherein the read data objects comprise a field in one cell and a field or embedded in a text and a table with variable number of lines and a table with variable number of columns; s3, packaging the read data object and returning. The invention can read various common data presentation forms and has great adaptability.

Description

Method for reading complex electronic form file and electronic equipment
Technical Field
The invention relates to the technical field of computers, in particular to a method for reading a complex electronic table file and electronic equipment.
Background
Spreadsheets (i.e., excel files) are widely available in everyday tasks, and users can formulate various style-complicated forms using excel. Importing a spreadsheet into a system is a common function of application software, and a developer usually uses jxl, poi and other components to access an excel file to read file content import system, but these components only provide a function of taking values according to cell coordinates in practice, so that a programmer needs to preset coordinates of each data item when facing a complex spreadsheet, and the difficulty of reading is greater when facing a dynamic table (i.e. the number of rows is not fixed), so how to simply read data in the spreadsheet is a problem to be solved.
Disclosure of Invention
The invention provides a method for reading a complex electronic table file and electronic equipment, which are used for solving the problems.
The invention is realized by the following technical scheme:
a method for reading a complex electronic form file specifically comprises the following steps:
s1, constructing a Sheet model according to a template electronic form corresponding to the electronic form to be read, wherein the Sheet model is composed of a plurality of Element models, and the number of the Element models is consistent with the number of cells in the template electronic form and is used for describing the positions and the characteristics of the cells in the template electronic form;
s2, traversing an Element model in the Sheet model, and respectively reading data objects of the electronic table in different modes by a reader according to the type of the Element model, wherein the read data objects comprise a field in one cell and a field or embedded in a text and a table with variable number of lines and a table with variable number of columns;
s3, packaging the read data object and returning.
As an optimization, the Element model includes a plurality of model tags for describing data object types in the electronic form, and the reader reads the data object of the electronic form according to a reading mode corresponding to the model tags, where the model tags include an abscissa and an ordinate of a cell and 3 basic attributes corresponding to a data field, and the model tags include:
element: the method comprises the steps of describing one cell in a spreadsheet, prompting the reader not to extract all data objects in the cell in the spreadsheet corresponding to the keyword;
SingleElement: the method comprises the steps that one cell in the electronic form is described, the cell comprises a value-added attribute, the value-added attribute comprises a keyword contained in the cell, and the reader is prompted to extract all data objects in the cell in the electronic form corresponding to the keyword;
MixElement: the method comprises the steps that one cell in the electronic form is described, the cell comprises a value added attribute, the value added attribute comprises a keyword contained in the cell, and the reader is prompted to extract a data object which is embedded into the right side of a word which is not required to be extracted in the cell in the electronic form and corresponds to the keyword;
listdelement: the method comprises the steps that a dynamic table with an unfixed data line number is used for describing the existence of the dynamic table, the dynamic table comprises a value added attribute, the value added attribute comprises keywords contained in the dynamic table and a line growth stop identifier, a prompt reader analyzes a data Object in the dynamic table into data with a List < Map < String, object > > structure (namely, a plurality of key-value sets exist in one List, and each Map corresponds to one line of data);
BatchRowListelement: the method comprises the steps of describing a dynamic table with an unfixed data line in the electronic table and one line of data distributed in a plurality of lines of cells, wherein the dynamic table comprises value added attributes, wherein the value added attributes comprise keywords contained in the dynamic table, a stop identifier for line growth and an expansion crossing line number attribute, analyzing the data content in the dynamic table into List < Map < String, object > structural data, and defining the range of each line of data by using the crossing line attribute, and prompting a reader to analyze the data content in the dynamic table into List < Map < String, object > structural data (namely, a plurality of key-value sets exist in one List, and each Map corresponds to one line of data);
DataScaler: and the interface is used for transmitting the read data object of each row of the dynamic table to the corresponding DataScaler implementation class processing of the dynamic table.
As optimization, the specific steps of S1 are:
s1.1, respectively defining keywords for data objects to be read in a spreadsheet;
s1.2, merging model labels of the Element model and attributes (here, basic attributes and value added attributes) of the model labels with the keywords to obtain elements of the Sheet model;
s1.3, if the model tag is a model tag of a dynamic table describing a multi-row list (namely, the model tag describing the dynamic multi-row list), inputting the element to the next row of the title corresponding to the dynamic table data.
As optimization, a simplified expression is set to correspond to the model tag, the expression including:
s, corresponding to a SingleElement;
m, corresponding to MixElement;
l, corresponding listelent;
and (2) BR, corresponding to BatchRowListelement.
As optimization, the specific steps of S1 are:
s1.1, respectively defining keywords for data files to be read in a spreadsheet;
s1.2, merging the model labels of the Element models and the attributes thereof with the keywords to obtain elements of the Sheet models, wherein the model labels of the Element models are merged with the keywords in an expression mode;
s1.3, if the model tag is a model tag of a dynamic table describing a multi-row list (namely, a model tag describing a dynamic multi-row list), the element is input to the next row of a title corresponding to the dynamic table data.
As optimization, the specific steps of S2 are:
s2.1, reading data in elements of a dynamic multi-row list in the electronic form of a current page according to the Sheet model, and updating position pointers of the elements in the Sheet model;
s2.2, sequentially reading data in elements of a single-row list in the electronic form of the current page, and updating position pointers of the elements in the Sheet model;
s2.3, repeating the steps S2.1-S2.2 until all the data of the electronic forms are read.
As optimization, the specific implementation steps of S2.1 include:
s2.1.1, sorting all elements from small to large according to the number of lines to determine the basic position of each element;
s2.1.2, in the electronic table, finding the starting position of the corresponding dynamic list with the minimum line number in the element according to the element analyzed by the Sheet model;
s2.1.3 the reader reads a row of data starting from the starting position of the dynamic list;
s2.1.4, when the last data in a row is read, judging whether the read row data contains a stop mark for row growth; if yes, jump to S2.1.6, otherwise jump to S2.1.5;
s2.1.5, calling a realization method of the dynamic multi-row table corresponding to the DataScaler, and processing the read data object by the DataScaler realization class to continue to circularly read the next row;
s2.1.6, counting the number of lines of the read data in the dynamic list and updating the position pointers of the elements corresponding to the unread data, wherein the updating logic is as follows: if the initial number of lines of the new element is greater than the initial number of lines of the element corresponding to the read data, the initial number of lines of the data corresponding to the position pointer of the new element in the electronic table should be: the basic number of the new element is added with the number of the data lines read by the current dynamic list minus one, if the initial number of the new element is smaller than the initial number of the element corresponding to the read data, the initial number of the data corresponding to the position pointer of the new element in the electronic table is unchanged, the Sheet model is reversely updated according to the number of the data lines and the number of the data columns, so that the position pointer of the element in the Sheet model is adjusted, the element in the Sheet model always points to the correct position of the electronic table, and the accurate reading of the dynamic list data is achieved;
s2.1.7, find the model label of the next dynamic list, repeat S2.1.3-S2.1.6 until all dynamic list data are read.
As an optimization, S2.1.5, the data read is subjected to the DataScaler process specifically as follows:
the DataScaler interface is injected into the reader by calling the excelreader.
As optimization, in S3, the read data object is encapsulated into a map in the form of key-value, and the result is returned.
The invention also discloses an electronic device, which comprises at least one processor and a memory in communication connection with the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method of reading a complex spreadsheet file as described above.
Compared with the prior art, the invention has the following advantages and beneficial effects:
1. the model tag provided by the invention can assist a reader to read various common data presentation forms, and has larger adaptability, wherein one field is embedded into a text, a simple data list, a data list with uncertain column numbers, a data list with data in a plurality of rows and the like;
2. the invention can support the reading of complex excel, when a form with a plurality of data presentation forms exists in one form, the reader can accurately identify the boundary of the data associated with the element according to the element position and configuration information (namely, the value added attribute, such as the text to be excluded on the left and the text to be excluded on the right besides the position information, and the Listelement configuration information also comprises a stop identifier and keywords of each column), so that the condition of data missing and misreading can not occur;
3. when the data volume in one excel file is overlarge, the invention can realize the DataScaler interface, then call the excelReader.RegisterScaler method to inject the interface into the reader, and the excelReader can deliver the data to the DataScaler for processing instead of storing in the memory when reading each piece of data, thereby greatly reducing the consumption of the memory and being suitable for the scene of reading mass data;
4. compared with the existing easy excel reading scheme, the easy excel is only a set of reading tools, the code must be written by a programmer to finish reading, the reading is inconvenient, and the development cost is high.
Drawings
In order to more clearly illustrate the technical solutions of the exemplary embodiments of the present invention, the drawings that are needed in the examples will be briefly described below, it being understood that the following drawings only illustrate some examples of the present invention and therefore should not be considered as limiting the scope, and that other related drawings may be obtained from these drawings without inventive effort for a person skilled in the art. In the drawings:
FIG. 1 is a general flow chart of a method for reading a complex spreadsheet file according to the present invention;
FIG. 2 is a flow chart of reading a dynamic multi-row list;
FIG. 3 is a schematic diagram of a spreadsheet in an embodiment;
FIG. 4 is a schematic diagram of a template spreadsheet corresponding to FIG. 3;
FIG. 5 is a code diagram of invoking an ExcelReader to read data;
FIG. 6 is a code diagram of an ExcelReader read architecture.
Detailed Description
For the purpose of making apparent the objects, technical solutions and advantages of the present invention, the present invention will be further described in detail with reference to the following examples and the accompanying drawings, wherein the exemplary embodiments of the present invention and the descriptions thereof are for illustrating the present invention only and are not to be construed as limiting the present invention.
Example 1
The invention designs a set of analysis algorithm about the electronic form, a programmer creates a template file (namely a Sheet model at the back) by marking data items corresponding to the cells in excel, and the analysis algorithm identifies the change of the coordinates of the data items by comparing the difference between the template file and the data file of the electronic form to be imported so as to accurately find the data positions and complete the reading of the data. The reader is ExcelReader.
As shown in fig. 1, a method for reading a complex electronic form file specifically includes the following steps:
s1, constructing a Sheet model according to a template electronic form corresponding to the electronic form to be read, wherein the Sheet model is composed of a plurality of Element models, and the number of the Element models is consistent with the number of cells in the template electronic form and is used for describing the positions and the characteristics of the cells in the template electronic form.
The template electronic form is an electronic form file which is similar to the electronic form format to be read but notes the cell reading mode in the electronic form, fig. 3 is the electronic form to be read, and fig. 4 is the corresponding template electronic form.
In this embodiment, the Element model includes a plurality of model tags for describing the types of data objects in the electronic form, different model tags are respectively used for describing different cell types, and these model tags will help a reader (i.e. ExcelReader) decide in reading the electronic form in which manner to read the data objects of the electronic form, i.e. the reader reads the data objects of the electronic form according to the reading manner corresponding to the model tags, for example, singleElement only needs to read the text of the cell corresponding to its horizontal and vertical coordinates, and listdeleent needs to read all the row data from its vertical coordinates to the "row growth stop identifier" and encapsulates the row data into a List structure.
All types of model tags have 3 basic attributes of the abscissa, the ordinate and the corresponding data field of the cell, and the model tag (auxiliary ExcelReader reads data) comprises:
element: the method comprises the steps of describing one cell in a spreadsheet, prompting the reader not to extract all data objects in the cell in the spreadsheet corresponding to the keyword;
SingleElement: the method comprises the steps that one cell in the electronic form is described, the cell comprises a value-added attribute, the value-added attribute comprises keywords contained in the cell, and a prompt reader extracts all data objects in the cell in the electronic form corresponding to the keywords;
MixElement: the value-added attribute comprises a keyword contained in the cell, and the prompting reader extracts a data object which is embedded into the right side of the text which is not required to be extracted in the cell in the electronic table corresponding to the keyword;
listdelement: the method comprises the steps that a dynamic table with an unfixed data line number is used for describing the existence of the dynamic table, the dynamic table comprises a value added attribute, the value added attribute comprises keywords contained in the dynamic table and a line growth stopping mark, a prompt reader analyzes the data objects in the dynamic table into data with a List < Map < String, object > structure (which can be understood as multi-line data, each line is provided with a plurality of columns, a plurality of key-value sets exist in a List of the dynamic table, each Map corresponds to one line of data, key-value is a keyword-data Object, keywords are content in a sheet model, the data objects exist in the electronic table, and when a certain keyword is read in the sheet model, the data objects corresponding to the keywords in the electronic table are read;
BatchRowListelement: the value added attribute comprises a keyword, a stop identifier for line growth and an expansion crossing line number attribute contained in the dynamic table, the data content in the dynamic table is analyzed into List < Map < String, object > structural data, the crossing line attribute is used for defining the range of each line of data, and the prompting reader analyzes the data content in the dynamic table into List < Map < String, object > structural data (namely, a plurality of key-value sets exist in a List, and each Map corresponds to one line of data);
sometimes, in a spreadsheet, a row of data is distributed across multiple rows of cells, and the "cross-row" attribute describes that a row of data is specifically distributed across several rows of cells. The following table distributes the data of each person in 3 rows, so the crossing rows are 3.
DataScaler: the interface is used for realizing the fast reading of data on the premise of low memory consumption, and the principle is that each line of the read dynamic table is not stored in a memory, and the data is transferred to the DataScaler corresponding to the dynamic table to realize class processing, so that the occupation of an intermediate memory and the consumption promotion efficiency of an intermediate data conversion process are reduced.
For the sake of simplified expression of the Sheet model, a set of expressions corresponding to model labels is defined for making the Sheet model.
S corresponds to SingleElement, illustrating taking the complete data of one cell.
The $ M corresponds to the MixElement, say that the value of the object is embedded in the text of the cell.
The $ L corresponds to the Listelement, which states that all rows from the beginning of the cell where it is located to the end of the identification are the contents of the dynamic table, which are parsed by definition into List < Map < String, object > > structure data.
The $ BR corresponds to the BatchRowListelement, which states that all rows from the beginning of the cell where it is located to the end of the identification are the contents of the dynamic table, which are parsed by definition into List < Map < String, object > > structure data, which demarcates the scope of each row of data with a "cross row" attribute.
In this embodiment, the specific steps of S1 are as follows:
s1.1, respectively defining keywords for data files to be read in a spreadsheet; as shown in fig. 3-4, to read the "unit of fill data" in the B2 cell, the B2 cell definition key in the Sheet model is "enterpriseName", and the data files in other cells are set in the same way.
S1.2, merging model labels of the Element model and attributes of the model labels with keywords to obtain elements of the Sheet model; or, merging the model labels of the Element models with the keywords in an expression mode;
the expressions are illustrated herein in combination with keywords.
As shown in fig. 4, in the template electronic table construction sheet model corresponding to fig. 3, the B2 cell is a simple cell, and therefore, the expression "$s" component element "$s enterpriseName" is added to the front of the keyword "enterpriseName";
for another example, there is a data item "2022 month 12" on the right of the "energy purchase and sale account page", and "2022 month 12" is embedded in the cell of the "energy purchase and sale account page", so the model label corresponding to "2022 month 12" is "mix element", the corresponding expression is "$m", and the keyword of "2022 month 12" is "reportDate", so the element after merging is "$m reportDate".
S1.3, if the model label is a model label of a dynamic table describing a multi-row list, the element is input to the next row of a title corresponding to the dynamic table data.
For example, the three titles corresponding to "purchase, industrial consumption, and end-of-period inventory" are dynamic multi-line lists, so that the model labels of the data items corresponding to the three titles are "ListElement", the corresponding expression is "$L", and because the attribute of "ListElement" carries a keyword and a stop mark of line growth, the elements corresponding to the three titles are "$LdataList1< ntmc, rq, jldw, dm, sl, je, cgly, gyyclxf, gycyyysjxf >)! 2. Industrial production and consumption,
"$LdataList2< ntmc, rq, jldw, dm, sl, je, cgly, gycyyyclxf, gycyyyygjxf > ] is-! 3. End of period inventory "
"$LdataList3< ntmc, rq, jldw, dm, sl, je, cgly, gycyyyclxf, gycyyyygjxf > ] is-! Remarks: ";
wherein "$L" is an expression corresponding to the model tag, dataList1, dataList2, dataList3 are three keywords expressing corresponding data items, respectively, < ntmc, rq, jldw, dm, sl, je, cgly, gycyyxf, gycyyygjxf > is a keyword of "Litselement", "ntmc, rq, jldw, dm, sl, je, cgly, gycyyyclxf, gycyyyyjxf" corresponds to the data names of cells A3-I3 in the electronic form, respectively, "energy name, date, metering unit, code, quantity, amount, purchase source, keywords for raw material consumption, industrial production for transportation means consumption", and "! "stop identification for row growth for dynamic multi-row list," +.! 2. Industrial production consumption ", i! 3. End of term inventory ", i! Remarks: "indicates that, in the purchased dynamic multi-line list, if the cell" two, industrial production and consumption "is read, the reading of the multi-line list is stopped; in the multi-row list for industrial production and consumption, if the cell of three or end-of-period stock quantity is read, stopping reading the multi-row list; in the multi-line list of end-of-term inventory, if "remarks" are read: "this cell" the reading of the multi-row list is stopped.
The number of Element models is consistent with the number of cells in the template electronic form, and as shown in fig. 3 and fig. 4, the sheet models corresponding to the data objects without lifting the cells in fig. 4 are all Element models.
S2, traversing an Element model in the Sheet model, and respectively reading data objects of the electronic table in different modes according to the types of the Element model by a reader, wherein the read data objects comprise fields in one cell and fields or embedded in a text and/or a table with variable rows and/or a table with variable columns.
The main function of the reader (ExcelReader) is to input a template file (Sheet model), data words to be read and tab index reading where the data to be read are located, as shown in fig. 6, after the algorithm is run, a Map < String, object > is returned for a value, list Element and batch rowlist Element are returned for a value for a key of an Element (Element), and List < Map < String, object > is returned for a value, and the code for calling the ExcelReader to read the data is called as shown in fig. 5, and the reading mode is performed according to the model tag.
In this embodiment, the specific steps of S2 are as follows:
s2.1, reading data in elements of a dynamic multi-row list in a spreadsheet of a current page according to a Sheet model, and updating position pointers of the elements in the Sheet model; after the element positions of the dynamic multi-row list are determined, the positions of the other elements are fixed.
In particular, as shown in fig. 2,
s2.1.1, sorting all elements from small to large according to the number of lines to determine the basic position of each element;
s2.1.2, in the electronic table, finding the starting position of the corresponding dynamic list with the minimum line number in the element according to the element analyzed by the Sheet model; i.e., cells with $l, $br expressions.
S2.1.3 the reader reads one row of data starting from the starting position of the dynamic multi-row list;
s2.1.4, when the last data in a row is read, judging whether the read row data contains a stop mark for row growth; if yes, jump to S2.1.6, otherwise jump to S2.1.5; excel itself has the function of obtaining the maximum number of rows and the maximum number of columns of the table, so that the judging mode is the prior art and will not be repeated here;
s2.1.5, calling a realization method of the dynamic multi-row table corresponding to the DataScaler, processing the read data object by the DataScaler realization class, continuously circularly reading the next row, injecting the DataScaler interface into the reader by calling an excelReader. A table is designed for storing the relation between the keywords of the dynamic table and the DataScaler, and when the table is read, the corresponding DataScaler realization class of the dynamic table is searched according to the keywords of the dynamic table, if the table is available, the corresponding processing is used, and if the table is not available, the default processing is used.
S2.1.6, counting the number of rows of the read data in the dynamic multi-row list and updating the position pointers of the elements corresponding to the unread data, wherein the updating logic is as follows: if the initial number of lines of the new element is greater than the initial number of lines of the element corresponding to the read data, the initial number of lines of the data corresponding to the position pointer of the new element in the electronic table should be: the number of the basic lines of the new element is increased by one, and the number of the data lines read by the current dynamic list is decreased by one; if the initial number of rows of the new element is smaller than the initial number of rows of the element corresponding to the read data, the initial number of rows of the corresponding data in the electronic table pointed to by the position pointer of the new element is unchanged, for example, as shown in fig. 3-4, the cell titled "one, purchase" in the electronic table is located at A5, and the corresponding dynamic multi-row list is respectively located at rows 6, 7, 8, for a total of 3, the cell titled "two, industrial consumption" is located at A9, and the corresponding dynamic multi-row list is respectively located at rows 10, 11, the cell titled "three, end stock" is located at a12, and the corresponding dynamic multi-row list is respectively located at rows 13, 14, 15, and 16, so that the basic position of each element in the Sheet model is first determined, "$ldatat 1< ntmc, rq, jw, dm, sll je, cglyycyyxf, gycyyjxf | gcyjxf! 2. Industrial production consumption "the basic position of this element is A6 (hereinafter referred to as the first element);
"$LdataList2< ntmc, rq, jldw, dm, sl, je, cgly, gycyyyclxf, gycyyyygjxf > ] is-! 3. End of period inventory "the base location of this element is at A8 (hereafter referred to as the second element);
"$LdataList3< ntmc, rq, jldw, dm, sl, je, cgly, gycyyyclxf, gycyyyygjxf > ] is-! Remarks: "the basic position of this element is a10 (hereinafter referred to as the third element);
the position pointer of the first element starts from the 6 th row, the ExcelReader starts to read data from the A6 cell of the electronic table, the 8 th row is always read, and the number of rows of the basic position of the second element is larger than that of the first element, so that the initial number of rows of the corresponding data in the electronic table pointed by the position pointer of the second element is 8+2, namely the 10 th row, and the data reading starts from the A10; the number of lines of the base position of the third element is greater than that of the second element, so that the initial number of lines of the corresponding data in the electronic table pointed by the position pointer of the third element is 10+2+1, namely, the 13 th line, and the data reading starts from A13.
And reversely updating the Sheet model according to the number of rows and the number of columns of the data to adjust the position pointers of the elements in the Sheet model, so that the elements in the Sheet model always point to the correct positions of the electronic table, and the accurate reading of the dynamic list data is achieved.
S2.1.7, find the model tag of the next dynamic list (i.e. dynamic multi-row list), repeat S2.1.3-S2.1.6 until all dynamic list data are read.
S2.2, sequentially reading data in elements of a single-row list in a spreadsheet of a current page, updating position pointers of the elements in a Sheet model, and fixing positions of other elements after the data of the dynamic table are read, wherein the time is only required to sequentially analyze values (data objects) of cells corresponding to each SingleElement and MixElement;
after the dynamic multi-row list reading is completed, the positions of other elements are fixed, when the data of the single-row list is read, the data is read from the single-row list with the minimum row number, the judgment is carried out according to the model label in the Sheet model, the data is read from the small row number to the large row number, the rule is updated as S2.1.6, if the initial row number of the new element is greater than the initial row number of the element corresponding to the read data, the initial row number of the data corresponding to the position pointer of the new element in the electronic table is supposed to be: the number of the basic lines of the new element is subtracted by one from the number of the data lines read by the current dynamic list, and if the initial number of the new element is smaller than the initial number of the elements corresponding to the read data, the initial number of the new element is unchanged.
S2.3, repeating the steps S2.1-S2.2 until all the data of the electronic forms are read.
S3, packaging and returning the read data object, specifically packaging the read data object into a map according to a key-value form, and returning a result.
Example 2
The invention also discloses an electronic device, which comprises at least one processor and a memory in communication connection with the at least one processor; the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method of reading a complex spreadsheet file as described above.
The foregoing description of the embodiments has been provided for the purpose of illustrating the general principles of the invention, and is not meant to limit the scope of the invention, but to limit the invention to the particular embodiments, and any modifications, equivalents, improvements, etc. that fall within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims (9)

1. The reading method of the complex electronic form file is characterized by comprising the following steps of:
s1, constructing a Sheet model according to a template electronic form corresponding to the electronic form to be read, wherein the Sheet model is composed of a plurality of Element models, and the number of the Element models is consistent with the number of cells in the template electronic form and is used for describing the positions and the characteristics of the cells in the template electronic form;
the Element model comprises a plurality of model labels for describing data object types in the electronic form, a reader reads the data objects of the electronic form according to a reading mode corresponding to the model labels, the model labels comprise an abscissa, an ordinate and 3 basic attributes corresponding to data fields of cells, and the model labels comprise:
element: for describing one cell in the electronic form, prompting the reader to not extract all data objects within the cell in the electronic form;
SingleElement: the method comprises the steps that one cell in the electronic form is described, the cell comprises a value-added attribute, the value-added attribute comprises a keyword contained in the cell, and the reader is prompted to extract all data objects in the cell in the electronic form corresponding to the keyword;
MixElement: the method comprises the steps that one cell in the electronic form is described, the cell comprises a value added attribute, the value added attribute comprises a keyword contained in the cell, and the reader is prompted to extract a data object which is embedded into the right side of a word which is not required to be extracted in the cell in the electronic form and corresponds to the keyword;
listdelement: the method comprises the steps that a dynamic table with an unfixed data line number is used for describing the electronic table, the dynamic table comprises a value added attribute, the value added attribute comprises a keyword and a line growth stopping mark, the keyword is contained in the dynamic table, and a prompt reader analyzes a data Object in the dynamic table into data with a List < Map < String, object >;
BatchRowListelement: the method comprises the steps of describing a dynamic table with an unfixed data line in the electronic table and one line of data distributed in a plurality of lines of cells, wherein the dynamic table comprises value added attributes, wherein the value added attributes comprise keywords contained in the dynamic table, a stop mark for line growth and an expansion crossing line number attribute, analyzing the data content in the dynamic table into List < Map < String, object > structural data, demarcating the range of each line of data by using the crossing line attribute, and prompting a reader to analyze the data content in the dynamic table into List < Map < String, object > structural data;
DataScaler: the interface is used for transmitting the read data object of each row of the dynamic table to the corresponding DataScaler implementation class processing of the dynamic table;
s2, traversing an Element model in the Sheet model, and respectively reading data objects of the electronic table in different modes by a reader according to the type of the Element model, wherein the read data objects comprise a field in one cell and a field or embedded in a text and a table with variable number of lines and a table with variable number of columns;
s3, packaging the read data object and returning.
2. The method for reading a complex spreadsheet file as claimed in claim 1, wherein the specific steps of S1 are:
s1.1, respectively defining keywords for data objects to be read in a spreadsheet;
s1.2, merging the model labels of the Element model and the attributes of the model labels with the keywords to obtain elements of the Sheet model;
s1.3, if the model label is a model label of a dynamic table describing a multi-row list, inputting the element to the next row of a title corresponding to the dynamic table data.
3. The method of reading a complex spreadsheet file as claimed in claim 1, wherein a simplified expression is set to correspond to said model tag, said expression comprising:
s, corresponding to a SingleElement;
m, corresponding to MixElement;
l, corresponding listelent;
and (2) BR, corresponding to BatchRowListelement.
4. A method for reading a complex electronic form document according to claim 3, wherein the specific steps of S1 are:
s1.1, respectively defining keywords for data files to be read in a spreadsheet;
s1.2, merging the model labels of the Element models and the attributes thereof with the keywords to obtain elements of the Sheet models, wherein the model labels of the Element models are merged with the keywords in an expression mode;
s1.3, if the model label is a model label of a dynamic table describing a multi-row list, the element is input to the next row of a title corresponding to the dynamic table data.
5. The method for reading a complex electronic form file according to claim 1, wherein the specific steps of S2 are as follows:
s2.1, reading data in a cell of a dynamic multi-row list in the electronic form of a current page according to the Sheet model, and updating a position pointer of an element in the Sheet model;
s2.2, sequentially reading data in cells of a single-row list in the electronic form of the current page, and updating position pointers of elements in the Sheet model;
s2.3, repeating the steps S2.1-S2.2 until all the data of the electronic forms are read.
6. The method for reading a complex spreadsheet file as claimed in claim 5, wherein the specific implementation step of S2.1 comprises:
s2.1.1, sorting all elements from small to large according to the number of lines to determine the basic position of each element;
s2.1.2, in the electronic table, finding the starting position of the corresponding dynamic list with the minimum line number in the element according to the element analyzed by the Sheet model;
s2.1.3 the reader reads a row of data starting from the starting position of the dynamic list;
s2.1.4, when the last data in a row is read, judging whether the read row data contains a stop mark for row growth; if yes, jump to S2.1.6, otherwise jump to S2.1.5;
s2.1.5, calling a realization method of the dynamic multi-row table corresponding to the DataScaler, and processing the read data object by the DataScaler realization class to continue to circularly read the next row;
s2.1.6, counting the number of rows of the read data in the dynamic multi-row list and updating the position pointers of the elements corresponding to the unread data, wherein the updating logic is as follows: if the initial number of lines of the unread element is greater than the initial number of lines of the element corresponding to the read data, the initial number of lines of the data corresponding to the electronic table pointed by the position pointer of the unread element should be: the basic line number of the unread element is added with the line number of the data read by the current dynamic list by one, if the initial line number of the new element is smaller than the initial line number of the element corresponding to the read data, the initial line number of the data corresponding to the position pointer of the new element in the electronic table is unchanged;
s2.1.7, find the model tag of the next dynamic multi-line list, repeat S2.1.3-S2.1.6 until all the data of the dynamic multi-line list are read.
7. The method for reading a complex spreadsheet file as claimed in claim 6, wherein the step of submitting the read data object to the DataScaler processing in S2.1.5 is as follows:
the DataScaler interface is injected into the reader by calling the excelreader.
8. The method for reading a complex spreadsheet file according to claim 1, wherein in S3, the read data object is encapsulated into a map in the form of key-value, and the result is returned.
9. An electronic device comprising at least one processor, and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method of reading a complex spreadsheet file as claimed in any one of claims 1 to 8.
CN202310495684.6A 2023-05-05 2023-05-05 Method for reading complex electronic form file and electronic equipment Active CN116521845B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310495684.6A CN116521845B (en) 2023-05-05 2023-05-05 Method for reading complex electronic form file and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310495684.6A CN116521845B (en) 2023-05-05 2023-05-05 Method for reading complex electronic form file and electronic equipment

Publications (2)

Publication Number Publication Date
CN116521845A CN116521845A (en) 2023-08-01
CN116521845B true CN116521845B (en) 2024-03-05

Family

ID=87406041

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310495684.6A Active CN116521845B (en) 2023-05-05 2023-05-05 Method for reading complex electronic form file and electronic equipment

Country Status (1)

Country Link
CN (1) CN116521845B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5033009A (en) * 1989-03-03 1991-07-16 Dubnoff Steven J System for generating worksheet files for electronic spreadsheets
CN101145148A (en) * 2007-10-30 2008-03-19 金蝶软件(中国)有限公司 Processing method and apparatus for electronic table file
CN115759025A (en) * 2022-11-14 2023-03-07 兴业银行股份有限公司 Excel data conversion method, system, medium and device based on template file
CN116028022A (en) * 2023-01-09 2023-04-28 广东云智安信科技有限公司 Java technology-based zero code Excel data importing method, device and medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4256416B2 (en) * 2006-09-29 2009-04-22 株式会社東芝 Data structure conversion system and program

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5033009A (en) * 1989-03-03 1991-07-16 Dubnoff Steven J System for generating worksheet files for electronic spreadsheets
CN101145148A (en) * 2007-10-30 2008-03-19 金蝶软件(中国)有限公司 Processing method and apparatus for electronic table file
CN115759025A (en) * 2022-11-14 2023-03-07 兴业银行股份有限公司 Excel data conversion method, system, medium and device based on template file
CN116028022A (en) * 2023-01-09 2023-04-28 广东云智安信科技有限公司 Java technology-based zero code Excel data importing method, device and medium

Also Published As

Publication number Publication date
CN116521845A (en) 2023-08-01

Similar Documents

Publication Publication Date Title
CN101253498B (en) Learning facts from semi-structured text
US20080084573A1 (en) System and method for relating unstructured data in portable document format to external structured data
CN111433762B (en) Graphically organizing content in a user interface of a software application
US20100017700A1 (en) Methods and Systems for Handling Annotations and Using Calculation of Addresses in Tree-Based Structures
US20040088650A1 (en) Methods and apparatus for generating a spreadsheet report template
US20060218160A1 (en) Change control management of XML documents
US20190057074A1 (en) Patent automation system
CN111209396A (en) Entity recognition model training method, entity recognition method and related device
CN111159982B (en) Document editing method, device, electronic equipment and computer readable storage medium
US7370060B2 (en) System and method for user edit merging with preservation of unrepresented data
US10699112B1 (en) Identification of key segments in document images
CN107679208A (en) A kind of searching method of picture, terminal device and storage medium
CN112286934A (en) Database table importing method, device, equipment and medium
CN114817481A (en) Big data-based intelligent supply chain visualization method and device
CN111475700A (en) Data extraction method and related equipment
CN104679453A (en) Information input, storage, typesetting and printing general system and information input, storage, typesetting and printing method
CN116521845B (en) Method for reading complex electronic form file and electronic equipment
CN104077288A (en) Web page content recommendation method and web page content recommendation equipment
CN112463931A (en) Intelligent analysis method for insurance product clauses and related equipment
CN116127047B (en) Method and device for establishing enterprise information base
CN107977459B (en) Report generation method and device
CN114626927B (en) Building material commodity network transaction method and system
CN113687827A (en) Data list generation method, device and equipment based on widget and storage medium
CN112949274A (en) Document data entry method and system
US20070168857A1 (en) Transformation of Source Data in a Source Markup Language to Target Data in a Target Markup Language

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant