CN112818937A - Excel file identification method and device, electronic equipment and readable storage medium - Google Patents

Excel file identification method and device, electronic equipment and readable storage medium Download PDF

Info

Publication number
CN112818937A
CN112818937A CN202110231358.5A CN202110231358A CN112818937A CN 112818937 A CN112818937 A CN 112818937A CN 202110231358 A CN202110231358 A CN 202110231358A CN 112818937 A CN112818937 A CN 112818937A
Authority
CN
China
Prior art keywords
column
row
excel file
name
cell data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110231358.5A
Other languages
Chinese (zh)
Inventor
刘春生
吴森阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Glodon Co Ltd
Original Assignee
Glodon Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Glodon Co Ltd filed Critical Glodon Co Ltd
Priority to CN202110231358.5A priority Critical patent/CN112818937A/en
Publication of CN112818937A publication Critical patent/CN112818937A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/412Layout analysis of documents structured with printed lines or input boxes, e.g. business forms or tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/186Templates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities

Abstract

The invention relates to the technical field of list identification, and discloses an identification method and device of an Excel file, electronic equipment and a readable storage medium. Wherein, the method comprises the following steps: acquiring a target Excel file; analyzing the target Excel file to obtain cell data of the target Excel file; identifying the cell data, and determining a column name and/or a row name corresponding to the target Excel file; determining column text data and/or row text data corresponding to the column name and/or the row name based on the column name and/or the row name. By implementing the method and the device, the automatic identification of the target Excel file is realized, the problem of file import error caused by the inconsistency of the format of the target Excel file and the template format is avoided, and the Excel file in any format can be successfully imported.

Description

Excel file identification method and device, electronic equipment and readable storage medium
Technical Field
The invention relates to the technical field of list identification, in particular to an identification method and device of an Excel file, electronic equipment and a readable storage medium.
Background
The construction cost usually involves a tenderer and a bidder, and the bidder usually needs to import a list in an Excel format provided by the tenderer into software for cost calculation. At present, when data import of an Excel file is carried out, software can only identify the Excel file with a specific template format, and if the format of the Excel file provided by a tenderer is exactly consistent with the template format which can be identified by the software, the Excel file can be directly imported. However, the software cannot automatically identify the rows and columns of the Excel file, and when the format of the Excel file provided by the tenderer is inconsistent with the template format that can be identified by the software, the tenderer is required to manually adjust the format of the Excel file of the tenderer to be consistent with the template format that can be identified by the software, so that the data import of the Excel file can be realized, otherwise, the data import of the Excel file is wrong.
Disclosure of Invention
In view of this, embodiments of the present invention provide a method and an apparatus for identifying an Excel file, an electronic device, and a readable storage medium, so as to solve the problem of file import errors caused by that rows and columns of the Excel file cannot be automatically identified.
According to a first aspect, an embodiment of the present invention provides a method for identifying an Excel file, including the following steps: acquiring a target Excel file; analyzing the target Excel file to obtain cell data of the target Excel file; identifying the cell data, and determining a column name and/or a row name corresponding to the target Excel file; determining column text data and/or row text data corresponding to the column name and/or the row name based on the column name and/or the row name.
According to the identification method of the Excel file, provided by the embodiment of the invention, each cell data in the target Excel file is obtained by analyzing the obtained target Excel file, each cell data is identified, and the column name and/or the row name corresponding to the target Excel file, and the column text data and/or the row text data corresponding to the column name and/or the row name are determined. The method can be recognized without importing the target Excel file according to a certain template format, the column text data and the line text data contained in the target Excel file can be determined by recognizing the column name and/or the line name for the target Excel file with any format, the automatic recognition of the target Excel file is realized, the problem that the file import is wrong due to the fact that the format of the target Excel file is inconsistent with the template format is avoided, and the Excel file with any format can be successfully imported.
With reference to the first aspect, in a first implementation manner of the first aspect, the identifying the cell data and determining a column name and/or a row name corresponding to the target Excel file includes: matching the cell data based on a preset identifier, and judging whether the cell data meets a matching condition; and when the cell data meet the matching condition, judging that the target Excel file is successfully identified, and obtaining a column name and/or a row name corresponding to the preset identifier.
According to the identification method of the Excel file, provided by the embodiment of the invention, each cell data is matched through the preset identifier, whether the cell data meets the matching condition is judged, and when the cell meets the matching condition, the target Excel file is judged to be successfully identified, so that the column name and/or the row name corresponding to the preset identifier are/is obtained. The preset identifier is an identifier corresponding to a column name or a row name. Therefore, automatic identification of the target Excel file in any format is realized, the problem of file import error caused by inconsistency between the format of the target Excel file and the template format is avoided, and the target Excel file in any format can be ensured to be successfully imported.
With reference to the first implementation manner of the first aspect, in a second implementation manner of the first aspect, the determining whether the cell data meets a matching condition includes: acquiring identification keywords and exclusion keywords corresponding to the column identifiers; judging whether the column unit cell data is matched with the identification key; when the column unit lattice data is matched with the recognition keyword, judging whether the column unit lattice data is matched with the exclusion keyword; when the column cell data does not match the exclusion key, it is determined that the column cell data satisfies a matching condition.
With reference to the second implementation manner of the first aspect, in a third implementation manner of the first aspect, when the preset identifier is a row identifier, the identifying the cell data based on the preset identifier and determining whether the cell data meets a matching condition includes: determining current row cell data corresponding to the current row identifier based on the column cell data meeting the matching condition; acquiring a preset condition corresponding to the current line identifier; judging whether the current row unit cell data meets the preset condition or not; and when the current row unit cell data meets the preset condition, judging that the current row unit cell data meets the matching condition.
With reference to the third implementation manner of the first aspect, in a fourth implementation manner of the first aspect, when the preset identifier is a row identifier, the identifying the cell data based on the preset identifier and determining whether the cell data meets a matching condition further includes: and when the current row unit cell data does not meet the preset condition, judging that the current row unit cell data fails to be matched, and jumping to the next row for continuous identification.
The identification method of the Excel file provided by the embodiment of the invention matches each cell data through the column identifier, determines the column cell data meeting the matching condition, then can determine the row cell data of each row based on the column cell data meeting the matching condition, matches the row cell data of each row in sequence, judges whether the row cell meets the preset condition in sequence, and judges that the row cell data meets the matching condition when the row cell data meets the preset condition. Therefore, automatic identification of the rows and columns of the target Excel file in any format is realized, and the import of the target Excel file in any format is met.
With reference to the first aspect, in a fifth implementation of the first aspect, the method further comprises: and responding to a selection instruction of the target Excel file tab, and determining that the target Excel file corresponds to the tab to be imported.
According to the identification method of the Excel file, provided by the embodiment of the invention, the tab to be imported and the corresponding cell data thereof are determined based on the selection instruction by responding to the selection instruction of the target Excel file tab, so that the defect that the target Excel file is difficult to import part of data is overcome, and the flexible import of the target Excel file is realized.
With reference to the first embodiment of the first aspect, in a sixth embodiment of the first aspect, the method further includes: displaying the column name and/or the row name corresponding to the identified target Excel file, and column text data and/or row text data column data and row data corresponding to the column name and/or the row name; an adjustment instruction responsive to the column name and/or the row name; adjusting the column name and/or the row name based on the adjustment instruction.
According to the identification method of the Excel file, provided by the embodiment of the invention, the column name and/or the row name corresponding to the identified target Excel file and the column text data and/or the row text data column data corresponding to the column name and/or the row name are displayed, so that a user can determine whether the identification result of the target Excel file is correct, and the problem that the existing engineering file is damaged by blind import is avoided. When the identification result is not reasonable, the user can perform manual adjustment, and the electronic device can respond to the adjustment instruction of the column name and/or the row name and adjust the column name and/or the row name based on the adjustment instruction. Therefore, secondary adjustment of the recognition result can be realized, the original target Excel file does not need to be modified, and the import efficiency of the Excel file is improved.
According to a second aspect, an embodiment of the present invention provides an apparatus for identifying an Excel file, including: the acquisition module is used for acquiring a target Excel file; the analysis module is used for analyzing the target Excel file to obtain cell data of the target Excel file; the identification module is used for identifying the cell data and determining a column name and/or a row name corresponding to the target Excel file; a determining module, configured to determine, based on the column name and/or the row name, column text data and/or row text data corresponding to the column name and/or the row name.
The identification device for the Excel file, provided by the embodiment of the invention, obtains each cell data in the target Excel file by analyzing the obtained target Excel file, identifies each cell data, and determines the column name and/or the row name corresponding to the target Excel file, and the column text data and/or the row text data corresponding to the column name and/or the row name. The device can be identified only by importing the target Excel file according to a certain template format, column text data and line text data contained in the target Excel file can be determined for the target Excel file in any format by identifying the column name and/or the line name, automatic identification of the target Excel file is achieved, the problem that file import errors are caused due to the fact that the format of the target Excel file is inconsistent with the template format is avoided, and accordingly the Excel file in any format can be successfully imported.
According to a third aspect, an embodiment of the present invention provides an electronic device, including: the Excel file identification method comprises a memory and a processor, wherein the memory and the processor are mutually connected in a communication mode, computer instructions are stored in the memory, and the processor executes the computer instructions so as to execute the Excel file identification method of the first aspect or any embodiment of the first aspect.
According to a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, where computer instructions are stored, and the computer instructions are configured to cause a computer to execute the method for identifying an Excel file according to the first aspect or any implementation manner of the first aspect.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
FIG. 1 is a flow chart of a method for identifying an Excel file according to an embodiment of the present invention;
FIG. 2 is another flow chart of the identification method of an Excel file according to the embodiment of the invention;
FIG. 3 is another flow chart of a method for identifying an Excel file according to an embodiment of the present invention;
FIG. 4 is a block diagram of an Excel file identification apparatus according to an embodiment of the present invention;
fig. 5 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
When the Excel file data is imported, software can only identify the Excel file with a specific template format, and if the format of the Excel file provided by the tenderer is exactly consistent with the template format which can be identified by the software, the Excel file can be directly imported. However, the software cannot automatically identify the rows and columns of the Excel file, and when the format of the Excel file provided by the tenderer is inconsistent with the template format that can be identified by the software, the tenderer is required to manually adjust the format of the Excel file of the tenderer to be consistent with the template format that can be identified by the software, so that the data import of the Excel file can be realized, otherwise, the data import of the Excel file is wrong.
Based on the technical scheme, the column name and/or the row name of the Excel file are/is automatically identified by analyzing each cell data of the Excel file, the format of the Excel file is not required to be set, and therefore automatic identification and successful import of the row and column data of the Excel file with any format are achieved.
In accordance with an embodiment of the present invention, there is provided an embodiment of an identification method of an Excel file, it should be noted that the steps illustrated in the flowchart of the figure can be executed in a computer system such as a set of computer executable instructions, and that although a logical order is illustrated in the flowchart, in some cases, the steps illustrated or described may be executed in an order different from that herein.
In this embodiment, a method for identifying an Excel file is provided, which can be used in electronic devices, such as a mobile phone, a computer, a tablet computer, and the like, fig. 1 is a flowchart of a method for identifying an Excel file according to an embodiment of the present invention, and as shown in fig. 1, the flowchart includes the following steps:
and S11, acquiring the target Excel file.
The target Excel file is building bill of materials data imported into the electronic device for cost calculation. The target Excel file is usually an externally provided Excel file, for example, an Excel list is provided by a tenderer. The user can import the target Excel file from the outside.
And S12, analyzing the target Excel file to obtain the cell data of the target Excel file.
The cell data is the specific content in the cell. The target Excel file comprises a plurality of lines and columns of text data, and the electronic equipment can analyze the target Excel file to obtain a plurality of cell data contained in the target Excel file. For example, the target Excel file includes 3 rows and 3 columns, the electronic device may analyze the contents of the 9 cells in the 3 rows and 3 columns, respectively, to determine the cell data corresponding to each cell, such as specific cell contents of "name", "unit", "cm", and so on.
And S13, identifying the cell data, and determining the column name and/or the row name corresponding to the target Excel file.
And the electronic equipment sequentially traverses the obtained cell data from top to bottom or from left to right, and determines header information, namely a column name and/or a row name, of the target Excel file. For example, only column names or row names exist for a one-dimensional table; for a two-dimensional table, there are both column and row names. For a two-dimensional table, traversing cell data from left to right, and determining column names in a target Excel file; and traversing the cell data from top to bottom to determine the line name in the target Excel file. Usually, a plurality of column names and/or a plurality of row names correspond to the target Excel file, and column identification algorithms or row identification algorithms corresponding to different column names and/or row names can be defined. The core of the column recognition algorithm and the core of the row recognition algorithm are both recognized by performing keyword matching through regular expressions.
S14, based on the column name and/or the row name, column text data and/or row text data corresponding to the column name and/or the row name is determined.
The column names in the target Excel file have corresponding column text data, and the row names have corresponding row text data. And after the column name of the target Excel file is identified, determining the column text data corresponding to the column name according to the identified column names, and similarly, after the row name of the target Excel file is identified, determining the row text data corresponding to the row name according to the identified row names.
In the identification method of the Excel file provided in this embodiment, each cell data in the target Excel file is obtained by analyzing the obtained target Excel file, each cell data is identified, and a column name and/or a row name corresponding to the target Excel file, and column text data and/or line text data corresponding to the column name and/or the row name are determined. The method can be recognized without importing the target Excel file according to a certain template format, the column text data and the line text data contained in the target Excel file can be determined by recognizing the column name and/or the line name for the target Excel file with any format, the automatic recognition of the target Excel file is realized, the problem that the file import is wrong due to the fact that the format of the target Excel file is inconsistent with the template format is avoided, and the Excel file with any format can be successfully imported.
In this embodiment, a method for identifying an Excel file is provided, which can be used in electronic devices, such as a mobile phone, a computer, a tablet computer, and the like, fig. 2 is a flowchart of a method for identifying an Excel file according to an embodiment of the present invention, and as shown in fig. 2, the flowchart includes the following steps:
and S21, acquiring the target Excel file. For a detailed description, refer to the related description of step S11 corresponding to the above embodiment, and the detailed description is omitted here.
And S22, analyzing the target Excel file to obtain the cell data of the target Excel file. For a detailed description, refer to the related description of step S12 corresponding to the above embodiment, and the detailed description is omitted here.
And S23, identifying the cell data, and determining the column name and/or the row name corresponding to the target Excel file.
Specifically, the step S23 may include the following steps:
and S231, matching the cell data based on the preset identifier, and judging whether the cell data meets the matching condition.
The cell data includes column cell data and row cell data, the preset identifier is an algorithm identifier set to identify a row name or a column name, and the preset identifier may include a column identifier and a row identifier. Specifically, when identifying a column name, the column name may be taken as a column identifier, i.e., an algorithm identifier of a column identification algorithm; when identifying a row name, the row name may be taken as a row identifier, i.e. an algorithm identifier of a row identification algorithm. The cell data is matched based on the column identifier or the row identifier to determine the cell data corresponding to the column identifier or the row identifier, i.e., to determine whether the cell data satisfies the matching condition.
Specifically, when the preset identifier is a column identifier, the step S231 may include the following steps:
(1) an identification key and an exclusion key corresponding to the column identifier are obtained.
The identification keywords, i.e., the cell data including the identification keywords, are all identified, and the exclusion keywords, i.e., the cell data including the identification keywords, are not identified. The electronic device may add the recognition key and the exclusion key to a column recognition algorithm to perform the recognition of the column name. The number of the recognition keywords may be one or more, and is not particularly limited herein. It should be noted that the electronic device may automatically add the column name to the list of identification keywords.
(2) And judging whether the column unit cell data is matched with the identification key.
Traversing the identification keyword list, comparing the row unit cell data from top to bottom one by one, judging whether the row unit cell data has the identification keywords, if the row unit cell data has a certain identification keyword, judging that the row unit cell data is matched with the identification keywords, and executing the step (3); if the identification key words do not exist in the column unit lattice data, matching fails, and the column unit lattice data is judged not to be identified.
For example, define the column name ", register the column identification algorithm: containing the "name" text. Defining a column name "unit", registering a column identification algorithm: contains the text of 'unit' and 'measuring unit'. If the content in the cell D3 of the Excel file is "project name", then the column D is identified as the "name" column. The contents of the E3 cell are "Unit", then the E column is identified as the "Unit" column.
(3) It is determined whether the column cell data matches the exclusion key.
When the column unit lattice data is matched with the identification key word, further judging whether the column unit lattice data matched with the identification key word contains an exclusion key word, namely judging whether the column unit lattice data matched with the identification key word is matched with the exclusion key word, when the column unit lattice data matched with the identification key word does not contain the exclusion key word, judging that the column unit lattice data is not matched with the exclusion key word, and executing the step (4); otherwise, it is determined that the column cell data is not identified.
(4) It is determined that the column cell data satisfies the matching condition.
When the column unit cell data is not matched with the exclusion keyword, the column unit cell data does not contain the exclusion keyword, and the column unit cell data can be judged to meet the matching condition, namely the column identification of the target Excel file is successful.
Specifically, when the preset identifier is a row identifier, the step S231 may further include the following steps:
(5) based on the column cell data satisfying the matching condition, current row cell data corresponding to the current row identifier is determined.
The row cell data is composed of each column cell data. And when the column cell data meeting the matching condition is determined, all row cell data of the target Excel file can be determined. And determining the corresponding current line cell data from the line cell data of the target Excel file based on the current line identifier.
(6) And acquiring a preset condition corresponding to the current row identifier.
The preset condition is a corresponding line identification rule defined according to a specific construction cost business requirement, the line identification rule is not specifically limited, and a person skilled in the art can determine the line identification rule according to an actual business requirement. For example, a list line must have a name, a code, and a unit according to the requirement of the list service, and thus the line identification rule (preset condition) may be: the contents of the name column, the code column, and the unit column of the row of cell data cannot be empty.
(7) And judging whether the cell data of the current row meet a preset condition.
And comparing the cell data of the current row with a preset condition to determine whether the cell data of the current row meets the preset condition. And (5) if the cell data of the current row meet the preset condition, executing the step (8), otherwise, executing the step (9).
(8) And judging that the cell data of the current row meets the matching condition.
If the cell data of the current row meets the preset condition, the cell data of the current row is represented to meet the service requirement, and then the current cell data can be judged to meet the matching condition.
(9) And judging that the cell data of the current row fails to be matched, and jumping to the next row for continuous identification.
And if the current row unit cell data does not meet the preset condition, the current row unit cell data does not meet the service requirement, and the matching of the current row unit cell data fails. Here, the electronic device does not stop the line recognition but jumps to the next line of the corresponding line cell data to continue the recognition.
For example, defining a line name "manifest", the registration line identification algorithm: the "name" column of a row is not empty and the "unit" column is not empty. For the first line of the Excel file, values D1 and E1 both have no data, so the first line is labeled "unrecognized". For the second row, the content in D2 is "earth mover", the content in E2 is "m 3", neither is empty, and therefore the second row is identified as a "list".
S232, judging that the target Excel file is successfully identified, and obtaining a column name and/or a row name corresponding to the preset identifier.
When the cell data meets the matching condition, it can be determined that the column cell data and/or the row cell data of the target Excel file are successfully identified, and at this time, the column name and/or the row name corresponding to the preset identifier can be obtained.
Through the configuration, the electronic equipment is provided with the automatic identification capability. The specific identification process is as follows:
(1) and identifying columns. And starting to traverse the target Excel file, wherein the traversal sequence is from top to bottom and from left to right according to the cells. And reading the data in each cell, and identifying by using each column identification algorithm. If the data of a certain cell is recognized by the column recognition algorithm of a certain column, the column where the cell is located is identified as the defined column name, and for all the cells where the column is located, the column recognition is not performed subsequently. Since multiple identical column names are not allowed, the column identification algorithm to which the column corresponds has already identified the column, and the column identification algorithm is not subsequently invoked. And thus, until all the column recognition algorithms recognize corresponding columns, or Excel is traversed completely, and the columns which are not recognized are marked as unrecognized.
(2) The column identification described above may be used as a basis for row identification. And similarly, traversing the target Excel file again, and traversing the lines from top to bottom. When a certain row is traversed, because all columns are identified by the column identification, the cell data of all columns do not need to be traversed, the column cell data corresponding to the relevant columns are taken out according to the column numbers, the column cell data are identified by using a row identification algorithm, if the column cell data are identified by a certain row identification algorithm, the row is identified as a defined row name, the row identification is stopped, and the next row is automatically jumped to for identification. If a certain line is not recognized by any line recognition algorithm, the line is marked as not recognized, and the next line is skipped to continue recognition. Unlike column identification, row identification allows the same row name, so for each row of row cell data, all row identification algorithms are called each time to identify until all row cell data are traversed.
S24, based on the column name and/or the row name, column text data and/or row text data corresponding to the column name and/or the row name is determined. For a detailed description, refer to the related description of step S14 corresponding to the above embodiment, and the detailed description is omitted here.
In the identification method of the Excel file provided in this embodiment, each cell data is matched through a preset identifier, whether the cell data meets a matching condition is determined, and when the cell meets the matching condition, it is determined that the identification of the target Excel file is successful, and a column name and/or a row name corresponding to the preset identifier is obtained. The preset identifier is an identifier corresponding to a column name or a row name. Therefore, automatic identification of the target Excel file in any format is realized, the problem of file import error caused by inconsistency between the format of the target Excel file and the template format is avoided, and the target Excel file in any format can be ensured to be successfully imported.
In this embodiment, a method for identifying an Excel file is provided, which can be used in electronic devices, such as a mobile phone, a computer, a tablet computer, and the like, fig. 3 is a flowchart of a method for identifying an Excel file according to an embodiment of the present invention, and as shown in fig. 3, the flowchart includes the following steps:
and S31, acquiring the target Excel file.
And S32, analyzing the target Excel file to obtain the cell data of the target Excel file. For a detailed description, refer to the related description of step S22 corresponding to the above embodiment, and the detailed description is omitted here.
And S33, identifying the cell data, and determining the column name and/or the row name corresponding to the target Excel file. For a detailed description, refer to the related description of step S23 corresponding to the above embodiment, and the detailed description is omitted here.
S34, based on the column name and/or the row name, column text data and/or row text data corresponding to the column name and/or the row name is determined. For a detailed description, refer to the related description of step S24 corresponding to the above embodiment, and the detailed description is omitted here.
And S35, displaying column names and/or row names corresponding to the identified target Excel file, and column text data and/or row text data corresponding to the column names and/or row names.
And the electronic equipment displays all data (column text data and line text data) of the target Excel file, a line recognition result and a column recognition result through a preview interface. Specifically, the electronic equipment can display a column identification result corresponding to the target Excel file at the top of the preview interface, if the column identification result is identified, display a defined column name, and if the column identification result is not identified, display the column name not identified; the electronic setting can display a line recognition result corresponding to the target Excel file on the leftmost side of the preview interface, if the line recognition result is recognized, a defined line name is displayed, if the line recognition result is not recognized, each line is provided with a check box, and the recognized line is automatically checked by default.
S36, responding to the adjustment instruction of the column name and/or the row name.
The user can determine whether the recognition result is correct or not by previewing the recognition result of the target Excel file, and can manually adjust the recognition result with wrong recognition. The electronic device may then respond to the user-entered adjustment command for the column name and/or row name. In particular, the amount of the solvent to be used,
s37, adjusting the column name and/or the row name based on the adjustment instruction.
And the electronic equipment adjusts the column name or the row name of the identified target Excel file by responding to an adjusting instruction input by a user. Specifically, for a column, the user may click on the top recognition result, pop up a right-click menu, display all column names defined by the electronic device, and click on the corresponding column name, the electronic device may redefine the column as the selected column name in response to the click operation. Since the same column name does not exist, other columns outside the column having the same column name are automatically re-identified as unidentified. For a row, the user may click on the recognition result on the left side, pop up a right-click menu, display all row names defined by the electronic device, and click on the corresponding row name, the electronic device may redefine the row as the selected row name in response to the click operation.
S38, responding to the selection instruction of the target Excel file tab, and determining that the target Excel file corresponds to the tab to be imported.
The selection instruction is a selection operation of a target Excel file tab input by a user, and the electronic equipment can respond to the selection instruction input by the user. For example, the selection instruction may be a tab selection operation, and the electronic device may respond to the tab selection operation of the user. Of course, the selection instruction may also be other selection operations, which are not specifically limited herein and can be determined by those skilled in the art according to actual needs. The electronic equipment can determine the tab to be imported corresponding to the target Excel file according to the selection instruction.
Specifically, the preview interface can only display the cell data and the recognition result of one tab at a time, and different tabs can be switched through a tab drop-down box of the target Excel file provided by the preview interface. And only leading in the cell data of the currently selected tab during leading in, and if partial data of the currently selected tab needs to be led in, eliminating the check of the check frame where the row which does not need to be led in is located.
Through automatic recognition, manual recognition and data screening, the preview interface can present the actual cell data for executing import. And during import, traversing the target Excel file from bottom to bottom, and skipping the unchecked rows. And for the checked row, newly adding a record in a corresponding table of the database according to the identified row name, and reading the data of each field from each corresponding column of the row until the whole target Excel file is traversed.
According to the identification method of the Excel file, provided by the embodiment of the invention, the tab to be imported and the corresponding cell data thereof are determined based on the selection instruction by responding to the selection instruction of the target Excel file tab, so that the defect that the target Excel file is difficult to import part of data is overcome, and the flexible import of the target Excel file is realized. By displaying the column name and/or the row name corresponding to the identified target Excel file and the column text data and/or the row text data column data corresponding to the column name and/or the row name, a user can determine whether the identification result of the target Excel file is correct, and the problem that the existing engineering file is damaged by blind import is avoided. When the identification result is not reasonable, the user can perform manual adjustment, and the electronic device can respond to the adjustment instruction of the column name and/or the row name and adjust the column name and/or the row name based on the adjustment instruction. Therefore, secondary adjustment of the recognition result can be realized, the original target Excel file does not need to be modified, and the import efficiency of the Excel file is improved.
The embodiment also provides a device for identifying an Excel file, which is used for implementing the above embodiments and preferred embodiments, and the description of the device is omitted for brevity. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.
The embodiment provides an identification apparatus for an Excel file, as shown in fig. 4, including:
and the obtaining module 41 is used for obtaining the target Excel file. For a detailed description, reference is made to the corresponding related description of the above method embodiments, which is not repeated herein.
And the analysis module 42 is configured to analyze the target Excel file to obtain cell data of the target Excel file. For a detailed description, reference is made to the corresponding related description of the above method embodiments, which is not repeated herein.
And the identifying module 43 is configured to identify the cell data, and determine a column name and/or a row name corresponding to the target Excel file. For a detailed description, reference is made to the corresponding related description of the above method embodiments, which is not repeated herein.
A determining module 44, configured to determine column text data and/or row text data corresponding to the column name and/or row name based on the column name and/or row name. For a detailed description, reference is made to the corresponding related description of the above method embodiments, which is not repeated herein.
The identification device for the Excel file provided in this embodiment obtains each cell data in the target Excel file by analyzing the obtained target Excel file, identifies each cell data, and determines a column name and/or a row name corresponding to the target Excel file, and column text data and/or line text data corresponding to the column name and/or the row name. The device can be identified only by importing the target Excel file according to a certain template format, column text data and line text data contained in the target Excel file can be determined for the target Excel file in any format by identifying the column name and/or the line name, automatic identification of the target Excel file is achieved, the problem that file import errors are caused due to the fact that the format of the target Excel file is inconsistent with the template format is avoided, and accordingly the Excel file in any format can be successfully imported.
The Excel file recognition apparatus in the present embodiment is represented in the form of a functional unit, where the unit refers to an ASIC circuit, a processor and a memory executing one or more software or fixed programs, and/or other devices capable of providing the above functions.
Further functional descriptions of the modules are the same as those of the corresponding embodiments, and are not repeated herein.
An embodiment of the present invention further provides an electronic device, which has the device for identifying an Excel file shown in fig. 4.
Referring to fig. 5, fig. 5 is a schematic structural diagram of an electronic device according to an alternative embodiment of the present invention, and as shown in fig. 5, the electronic device may include: at least one processor 501, such as a CPU (Central Processing Unit), at least one communication interface 503, memory 504, and at least one communication bus 502. Wherein a communication bus 502 is used to enable connective communication between these components. The communication interface 503 may include a Display (Display) and a Keyboard (Keyboard), and the optional communication interface 503 may also include a standard wired interface and a standard wireless interface. The Memory 504 may be a Random Access Memory (RAM) or a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. The memory 504 may optionally be at least one storage device located remotely from the processor 501. Wherein the processor 501 may be in connection with the apparatus described in fig. 4, an application program is stored in the memory 504, and the processor 501 calls the program code stored in the memory 504 for performing any of the above-mentioned method steps.
The communication bus 502 may be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus. The communication bus 502 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 5, but this is not intended to represent only one bus or type of bus.
The memory 504 may include a volatile memory (RAM), such as a random-access memory (RAM); the memory may also include a non-volatile memory (english: non-volatile memory), such as a flash memory (english: flash memory), a hard disk (english: hard disk drive, abbreviated: HDD) or a solid-state drive (english: SSD); the memory 504 may also comprise a combination of the above-described types of memory.
The processor 501 may be a Central Processing Unit (CPU), a Network Processor (NP), or a combination of CPU and NP.
The processor 501 may further include a hardware chip. The hardware chip may be an application-specific integrated circuit (ASIC), a Programmable Logic Device (PLD), or a combination thereof. The PLD may be a Complex Programmable Logic Device (CPLD), a field-programmable gate array (FPGA), a General Array Logic (GAL), or any combination thereof.
Optionally, the memory 504 is also used to store program instructions. The processor 501 may call a program instruction to implement the method for identifying an Excel file as shown in the embodiments of fig. 1 to fig. 3 in the present application.
The embodiment of the invention also provides a non-transitory computer storage medium, wherein the computer storage medium stores computer executable instructions, and the computer executable instructions can execute the processing method of the identification method of the Excel file in any method embodiment. The storage medium may be a magnetic Disk, an optical Disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a Flash Memory (Flash Memory), a Hard Disk (Hard Disk Drive, abbreviated as HDD), a Solid State Drive (SSD), or the like; the storage medium may also comprise a combination of memories of the kind described above.
Although the embodiments of the present invention have been described in conjunction with the accompanying drawings, those skilled in the art may make various modifications and variations without departing from the spirit and scope of the invention, and such modifications and variations fall within the scope defined by the appended claims.

Claims (10)

1. A recognition method of an Excel file is characterized by comprising the following steps:
acquiring a target Excel file;
analyzing the target Excel file to obtain cell data of the target Excel file;
identifying the cell data, and determining a column name and/or a row name corresponding to the target Excel file;
determining column text data and/or row text data corresponding to the column name and/or the row name based on the column name and/or the row name.
2. The method according to claim 1, wherein the identifying the cell data and determining a column name and/or a row name corresponding to the target Excel file comprises:
matching the cell data based on a preset identifier, and judging whether the cell data meets a matching condition;
and when the cell data meet the matching condition, judging that the target Excel file is successfully identified, and obtaining a column name and/or a row name corresponding to the preset identifier.
3. The method according to claim 2, wherein the cell data includes column cell data and row cell data, and when the preset identifier is a column identifier, the identifying the cell data based on the preset identifier and determining whether the cell data satisfies a matching condition includes:
acquiring identification keywords and exclusion keywords corresponding to the column identifiers;
judging whether the column unit cell data is matched with the identification key;
when the column unit lattice data is matched with the recognition keyword, judging whether the column unit lattice data is matched with the exclusion keyword;
when the column cell data does not match the exclusion key, it is determined that the column cell data satisfies a matching condition.
4. The method according to claim 3, wherein when the preset identifier is a row identifier, the identifying the cell data based on the preset identifier and determining whether the cell data satisfies a matching condition comprise:
determining current row cell data corresponding to the current row identifier based on the column cell data meeting the matching condition;
acquiring a preset condition corresponding to the current line identifier;
judging whether the current row unit cell data meets the preset condition or not;
and when the current row unit cell data meets the preset condition, judging that the current row unit cell data meets the matching condition.
5. The method according to claim 4, wherein when the preset identifier is a row identifier, the identifying the cell data based on the preset identifier and determining whether the cell data satisfies a matching condition further comprises:
and when the current row unit cell data does not meet the preset condition, judging that the current row unit cell data fails to be matched, and jumping to the next row for continuous identification.
6. The method of claim 1, further comprising:
and responding to a selection instruction of the target Excel file tab, and determining that the target Excel file corresponds to the tab to be imported.
7. The method of claim 1, further comprising:
displaying the column name and/or the row name corresponding to the identified target Excel file, and column text data and/or row text data corresponding to the column name and/or the row name;
an adjustment instruction responsive to the column name and/or the row name;
adjusting the column name and/or the row name based on the adjustment instruction.
8. An identification device for Excel files, comprising:
the acquisition module is used for acquiring a target Excel file;
the analysis module is used for analyzing the target Excel file to obtain cell data of the target Excel file;
the identification module is used for identifying the cell data and determining a column name and/or a row name corresponding to the target Excel file;
a determining module, configured to determine, based on the column name and/or the row name, column text data and/or row text data corresponding to the column name and/or the row name.
9. An electronic device, comprising:
a memory and a processor, wherein the memory and the processor are communicatively connected with each other, the memory stores computer instructions, and the processor executes the computer instructions to execute the Excel file identification method according to any one of claims 1 to 7.
10. A computer-readable storage medium storing computer instructions for causing a computer to execute the method for identifying an Excel file according to any one of claims 1 to 7.
CN202110231358.5A 2021-03-02 2021-03-02 Excel file identification method and device, electronic equipment and readable storage medium Pending CN112818937A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110231358.5A CN112818937A (en) 2021-03-02 2021-03-02 Excel file identification method and device, electronic equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110231358.5A CN112818937A (en) 2021-03-02 2021-03-02 Excel file identification method and device, electronic equipment and readable storage medium

Publications (1)

Publication Number Publication Date
CN112818937A true CN112818937A (en) 2021-05-18

Family

ID=75862616

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110231358.5A Pending CN112818937A (en) 2021-03-02 2021-03-02 Excel file identification method and device, electronic equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN112818937A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115757423A (en) * 2022-11-29 2023-03-07 中诚智信工程咨询集团股份有限公司 Engineering cost data correction method, system, equipment and storage medium
CN116611430A (en) * 2023-07-17 2023-08-18 深圳市维度数据科技股份有限公司 excel file processing method and device, electronic equipment and storage medium

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005070829A (en) * 2003-08-22 2005-03-17 Merchandising On:Kk Data processing device, data processing method, and program therefor
US20070078872A1 (en) * 2005-09-30 2007-04-05 Ronen Cohen Apparatus and method for parsing unstructured data
US20140136243A1 (en) * 2012-11-13 2014-05-15 Hartford Fire Insurance Company System and method for loss analysis
CN106649319A (en) * 2015-10-29 2017-05-10 北京国双科技有限公司 Search result display method and device
CN108280056A (en) * 2017-12-26 2018-07-13 北京市天元网络技术股份有限公司 A kind of Excel file analytic method
CN111414889A (en) * 2020-03-31 2020-07-14 中国工商银行股份有限公司 Financial statement identification method and device based on character identification
CN111459943A (en) * 2020-04-03 2020-07-28 中国建设银行股份有限公司 Data processing method, device, system, equipment and storage medium
WO2020186783A1 (en) * 2019-03-21 2020-09-24 平安国际智慧城市科技股份有限公司 Data importing method and apparatus, and computer device and storage medium
CN111831382A (en) * 2020-07-20 2020-10-27 杭州品茗安控信息技术股份有限公司 Data entry method, device, equipment and medium for engineering cost software
CN111897884A (en) * 2020-07-20 2020-11-06 北京用友薪福社云科技有限公司 Data relation information display method and terminal equipment
CN112035412A (en) * 2020-08-31 2020-12-04 北京奇虎鸿腾科技有限公司 Data file importing method, device, storage medium and device

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005070829A (en) * 2003-08-22 2005-03-17 Merchandising On:Kk Data processing device, data processing method, and program therefor
US20070078872A1 (en) * 2005-09-30 2007-04-05 Ronen Cohen Apparatus and method for parsing unstructured data
US20140136243A1 (en) * 2012-11-13 2014-05-15 Hartford Fire Insurance Company System and method for loss analysis
CN106649319A (en) * 2015-10-29 2017-05-10 北京国双科技有限公司 Search result display method and device
CN108280056A (en) * 2017-12-26 2018-07-13 北京市天元网络技术股份有限公司 A kind of Excel file analytic method
WO2020186783A1 (en) * 2019-03-21 2020-09-24 平安国际智慧城市科技股份有限公司 Data importing method and apparatus, and computer device and storage medium
CN111414889A (en) * 2020-03-31 2020-07-14 中国工商银行股份有限公司 Financial statement identification method and device based on character identification
CN111459943A (en) * 2020-04-03 2020-07-28 中国建设银行股份有限公司 Data processing method, device, system, equipment and storage medium
CN111831382A (en) * 2020-07-20 2020-10-27 杭州品茗安控信息技术股份有限公司 Data entry method, device, equipment and medium for engineering cost software
CN111897884A (en) * 2020-07-20 2020-11-06 北京用友薪福社云科技有限公司 Data relation information display method and terminal equipment
CN112035412A (en) * 2020-08-31 2020-12-04 北京奇虎鸿腾科技有限公司 Data file importing method, device, storage medium and device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
T KASAR ET AL.: "Table information extraction and structure recognition using query patterns", 《2015 13TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR)》, pages 1086 - 1090 *
闫学东: "基于语义的半结构化文档检索", 《中国优秀硕士学位论文全文数据库 信息科技辑》, vol. 2007, no. 04, pages 138 - 436 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115757423A (en) * 2022-11-29 2023-03-07 中诚智信工程咨询集团股份有限公司 Engineering cost data correction method, system, equipment and storage medium
CN115757423B (en) * 2022-11-29 2024-01-30 中诚智信工程咨询集团股份有限公司 Engineering cost data correction method, system, equipment and storage medium
CN116611430A (en) * 2023-07-17 2023-08-18 深圳市维度数据科技股份有限公司 excel file processing method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN109508352B (en) Report data output method, device, equipment and storage medium
CN109635292B (en) Work order quality inspection method and device based on machine learning algorithm
CN112818937A (en) Excel file identification method and device, electronic equipment and readable storage medium
CN108388640B (en) Data conversion method and device and data processing system
CN110471945B (en) Active data processing method, system, computer equipment and storage medium
US20150278619A1 (en) Method and system for verification by reading
CN109670091A (en) A kind of metadata intelligent maintenance method and apparatus based on data standard
CN111667231B (en) Automatic tax return method, device, system, computer equipment and storage medium
CN114648302B (en) Data processing method and device for collaborative scenario editing
CN107844515B (en) Data compliance checking method and device
CN112052157B (en) Method, device and system for constructing test message
CN113434943A (en) BIM standard code processing method and device, electronic equipment and readable storage medium
CN111061733B (en) Data processing method, device, electronic equipment and computer readable storage medium
WO2019080419A1 (en) Method for building standard knowledge base, electronic device, and storage medium
CN108629699B (en) Data uploading method, data uploading equipment, storage medium and device
JP6994138B2 (en) Information management device and file management method
CN113098961B (en) Component uploading method, device and system, computer equipment and readable storage medium
CN115310011A (en) Page display method and system and readable storage medium
CN114398496A (en) Knowledge representation method, device, equipment and computer readable medium of text
CN109560964B (en) Equipment compliance checking method and device
CN114371866A (en) Version reconfiguration test method, device and equipment of service system
CN110597810A (en) Data processing method, device, terminal and storage medium
CN112966764B (en) Pattern comparison method and device
CN116484802B (en) Character string color marking method, device, computer equipment and storage medium
CN114283437A (en) Legend identification method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination