CN111813849A - Data extraction method, device and equipment and storage medium - Google Patents
Data extraction method, device and equipment and storage medium Download PDFInfo
- Publication number
- CN111813849A CN111813849A CN202010957895.3A CN202010957895A CN111813849A CN 111813849 A CN111813849 A CN 111813849A CN 202010957895 A CN202010957895 A CN 202010957895A CN 111813849 A CN111813849 A CN 111813849A
- Authority
- CN
- China
- Prior art keywords
- form template
- target
- template
- data
- field information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/254—Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2282—Tablespace storage structures; Management thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Document Processing Apparatus (AREA)
Abstract
The invention provides a data extraction method, a device and equipment and a storage medium, wherein the method comprises the following steps: acquiring a target form template matched with a form to be processed from the configured form templates, wherein each form template is provided with a corresponding database table and comprises field information of at least one field of a form head in the corresponding database table; determining data corresponding to the field information from the form according to the field information in the target form template to obtain target data; and extracting the target data to a database table corresponding to the target form template. And a corresponding extraction mode does not need to be configured for each form, so that the workload required by configuration can be reduced.
Description
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a data extraction method, apparatus and device, and a storage medium.
Background
With the continuous promotion and deepening of the informatization construction of China, the construction of business systems of government organs, enterprise groups and various industries has reached a certain level. The data storage mode is also various, and the data can be stored in different types of databases or files. For some files such as Excel files, the files have at least one form, data in the form is generally filled manually, and since the form formats are different in the same service system and possibly due to different regions or departments, a large number of Excel files with different structures exist.
When some service requirements are met, massive data in Excel files with different structures need to be stored in corresponding database tables according to types, for example, weather data is extracted from Excel files with different structures and stored in one database table, vegetable data is extracted from Excel files with different structures and stored in another database table, and the method can be realized by adopting an ETL technology. The ETL is a standard technology in the field of data integration, and is different from the conventional data exchange, and on the premise that basic data exchange (extraction, transmission, and loading) can be completed, the ETL provides easier and stronger support for data conversion (i.e., on-demand processing of data), so that data flows among different services, and the data acquired by each service is guaranteed to be accurate and timely, and to meet the service requirements.
In the existing ETL technology, most of extraction schemes for massive Excel files with different structures need to manually identify each form format in the Excel files, and for each form, a corresponding extraction mode is configured according to the form format, and then data in the form is extracted, so that time and labor are wasted.
Disclosure of Invention
In view of this, the present invention provides a data extraction method, apparatus and device, and a storage medium, which do not need to configure a corresponding extraction manner for each form, and can reduce the workload required by configuration.
A first aspect of the present invention provides a data extraction method, including:
acquiring a target form template matched with a form to be processed from the configured form templates, wherein each form template is provided with a corresponding database table and comprises field information of at least one field of a form head in the corresponding database table;
determining data corresponding to the field information from the form according to the field information in the target form template to obtain target data;
and extracting the target data to a database table corresponding to the target form template.
According to one embodiment of the invention, obtaining a target form template matching a form to be processed from configured form templates comprises:
traversing each row of cells in the form:
searching a form template containing field information matched with the content of the row of cells in the configured form template;
if the form template is found, determining the cell of the line as the form head of the form, and determining the found form template as the target form template.
According to one embodiment of the invention, the number of form headers of the form is 1; the method further comprises the following steps:
and when the line cell is determined to be the form head of the form, ending the traversal of the form.
According to an embodiment of the invention, the method further comprises:
if not, checking whether the current traversal times of the form reaches the maximum traversal times, if so, ending the traversal of the form, otherwise, continuing the traversal of the form.
According to an embodiment of the present invention, the finding of the form template containing field information matching the content of the row of cells in the configured form template further comprises:
and when the data type of each cell in the line is a text type, searching a form template containing field information matched with the content of the cell in the line in the configured form template.
In accordance with one embodiment of the present invention,
the field information at least comprises a field name;
searching the configured form template for the form template with the field information matched with the content of the row of cells, wherein the method comprises the following steps:
for each of the configured form templates:
determining a reference cell from the row of cells, wherein the content of the reference cell is matched with any field name in the form template;
and determining whether the form template is a form template with the contained field information matched with the content of the row of cells or not according to the number of the reference cells.
According to an embodiment of the present invention, the matching of the content of the reference cell and any field name in the form template refers to:
the content of the reference cell is the same as the name of any field in the form template;
or the content of the reference cell and any field name in the form template are similar words or synonyms, and the similar words or synonyms are determined through a set matching algorithm.
In accordance with one embodiment of the present invention,
the form template also comprises a set matching degree;
determining whether the form template is a form template with the contained field information matched with the content of the row of cells according to the number of the reference cells, wherein the method comprises the following steps:
calculating the ratio of the number of the reference cells to the number of the fields corresponding to the field information in the form template;
and if the ratio is greater than the set matching degree in the form template, determining that the form template is the form template with the contained field information matched with the content of the row of cells.
In accordance with one embodiment of the present invention,
the field information at least comprises a field name;
the form template further comprises a field sequence corresponding to each field information and at least one conversion rule corresponding to the field information, wherein the field sequence is determined according to a form head of a database table corresponding to the target form template;
determining data corresponding to the field information from the form according to the field information in the target form template to obtain target data, wherein the method comprises the following steps:
sequencing columns of target cells in the form according to the field sequence in the target form template, wherein the target cells belong to the form head of the form, and the content of the target cells is matched with any field name in the target form template;
and determining the cell content of each row positioned in the column of each target cell in the row as a piece of target data for each row positioned behind the head of the form in the form.
In accordance with one embodiment of the present invention,
before determining the cell contents in the column of the target cells in the row, the method further includes: if the merged cell exists in the row, splitting the merged cell, and filling the content of the merged cell into all split cells;
after determining the cell contents in the column of the target cells in the row, the method further includes: and converting the determined at least one cell content according to the conversion rule in the target form template, and taking the obtained cell content as a piece of target data.
According to an embodiment of the invention, the method further comprises:
when any database table needs to be configured with the form template, whether the corresponding relation between the database table and the form template needing to be configured exists in a specified database is checked, if the corresponding relation does not exist, the configuration of the form template is continued, and the corresponding relation between the database table and the form template is stored in the specified database.
According to an embodiment of the present invention, the obtaining of the target form template matching the form to be processed from the configured form templates further includes:
and under the condition of obtaining the data category stored in the form, obtaining a target form template matched with the form to be processed from the form templates configured in the database table for storing the data.
According to one embodiment of the invention, the form is any one form in the files to be processed, the format of the files to be processed is a specified file format, and the files to be processed contain at least one form.
A second aspect of the present invention provides a data extraction apparatus, including:
the target form template determining module is used for acquiring a target form template matched with a form to be processed from the configured form templates, each form template is provided with a corresponding database table, and the form template comprises field information of at least one field of a form head in the corresponding database table;
the target data determining module is used for determining data corresponding to the field information from the form according to the field information in the target form template to obtain target data;
and the data extraction module is used for extracting the target data into a database table corresponding to the target form template.
According to an embodiment of the present invention, when the target form template determining module obtains the target form template matching the to-be-processed form from the configured form template, the target form template determining module is specifically configured to:
traversing each row of cells in the form:
searching a form template containing field information matched with the content of the row of cells in the configured form template;
if the form template is found, determining the cell of the line as the form head of the form, and determining the found form template as the target form template.
According to one embodiment of the invention, the number of form headers of the form is 1; the target form template determination module is further to:
and when the line cell is determined to be the form head of the form, ending the traversal of the form.
According to an embodiment of the invention, the target form template determination module is further configured to:
if not, checking whether the current traversal times of the form reaches the maximum traversal times, if so, ending the traversal of the form, otherwise, continuing the traversal of the form.
According to an embodiment of the present invention, when the target form template determining module finds a form template in which the field information included in the configured form template matches the content of the row of cells, the target form template determining module is further configured to:
and when the data type of each cell in the line is a text type, searching a form template containing field information matched with the content of the cell in the line in the configured form template.
In accordance with one embodiment of the present invention,
the field information at least comprises a field name;
when the target form template determining module searches for a form template in which the included field information matches with the content of the row of cells in the configured form template, the target form template determining module is specifically configured to:
for each of the configured form templates:
determining a reference cell from the row of cells, wherein the content of the reference cell is matched with any field name in the form template;
and determining whether the form template is a form template with the contained field information matched with the content of the row of cells or not according to the number of the reference cells.
According to an embodiment of the present invention, the matching of the content of the reference cell and any field name in the form template refers to:
the content of the reference cell is the same as the name of any field in the form template;
or the content of the reference cell and any field name in the form template are similar words or synonyms, and the similar words or synonyms are determined through a set matching algorithm.
In accordance with one embodiment of the present invention,
the form template also comprises a set matching degree;
and when the target form template determining module determines whether the form template is a form template with the field information matched with the content of the row of cells according to the number of the reference cells, the target form template determining module is specifically configured to:
calculating the ratio of the number of the reference cells to the number of the fields corresponding to the field information in the form template;
and if the ratio is greater than the set matching degree in the form template, determining that the form template is the form template with the contained field information matched with the content of the row of cells.
In accordance with one embodiment of the present invention,
the field information at least comprises a field name;
the form template further comprises a field sequence corresponding to each field information and at least one conversion rule corresponding to the field information, wherein the field sequence is determined according to a form head of a database table corresponding to the target form template;
the target data determining module determines data corresponding to the field information from the form according to the field information in the target form template, and when obtaining the target data, the target data determining module is specifically configured to:
sequencing columns of target cells in the form according to the field sequence in the target form template, wherein the target cells belong to the form head of the form, and the content of the target cells is matched with any field name in the target form template;
and determining the cell content of each row positioned in the column of each target cell in the row as a piece of target data for each row positioned behind the head of the form in the form.
In accordance with one embodiment of the present invention,
the target data determination module is further configured to, before determining the cell content in the column of the target cell in the row, determine: if the merged cell exists in the row, splitting the merged cell, and filling the content of the merged cell into all split cells;
after the target data determination module determines the cell contents in the column of each target cell in the row, the target data determination module is further configured to: and converting the determined at least one cell content according to the conversion rule in the target form template, and taking the obtained cell content as a piece of target data.
According to an embodiment of the invention, the apparatus further comprises:
and the configuration module is used for checking whether the corresponding relation between the database table and the form template to be configured exists in the specified database when the form template configuration needs to be carried out on any database table, if the corresponding relation does not exist, continuing the configuration of the form template, and storing the corresponding relation between the database table and the form template into the specified database.
According to an embodiment of the present invention, when the target form template determining module obtains the target form template matching the to-be-processed form from the configured form templates, the target form template determining module is further configured to:
and under the condition of obtaining the data category stored in the form, obtaining a target form template matched with the form to be processed from the form templates configured in the database table for storing the data.
According to one embodiment of the invention, the form is any one form in the files to be processed, the format of the files to be processed is a specified file format, and the files to be processed contain at least one form.
A third aspect of the invention provides an electronic device comprising a processor and a memory; the memory stores a program that can be called by the processor; wherein, when the processor executes the program, the data extraction method as described in the foregoing embodiments is implemented.
A fourth aspect of the present invention provides a machine-readable storage medium on which a program is stored, the program, when executed by a processor, implementing the data extraction method as described in the foregoing embodiments.
The embodiment of the invention has the following beneficial effects:
in the embodiment of the invention, form templates corresponding to a plurality of database tables are configured in advance, each form template comprises field information corresponding to at least one field of a form header in the database table, when data in a form needs to be extracted, a target form template matched with the form can be obtained from a configured form template, data corresponding to field information is determined from the form based on the field information in the target form template, target data is obtained, and the target data is extracted into a database table corresponding to the target form template, for forms with different structures, under the condition of matching the same form template, the data extraction can be completed based on the same form template, the form format does not need to be identified manually, a corresponding extraction mode does not need to be configured for each form, and the workload and the time required by user configuration can be greatly reduced.
Drawings
FIG. 1 is a flow chart of a data extraction method according to an embodiment of the invention;
FIG. 2 is a flow chart of a data extraction method according to another embodiment of the present invention;
FIG. 3 is a block diagram of a data extraction device according to an embodiment of the present invention;
fig. 4 is a block diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It is to be understood that although the terms first, second, third, etc. may be used herein to describe various objects, this information should not be limited to these terms. These terms are only used to distinguish objects of the same type from each other. For example, a first object may also be referred to as a second object, and similarly, a second object may also be referred to as a first object, without departing from the scope of the present invention. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.
The data extraction method according to the embodiment of the present invention is described in more detail below, but should not be limited thereto. In one embodiment, referring to fig. 1, a data extraction method, applied to an electronic device, may include the steps of:
s100: acquiring a target form template matched with a form to be processed from the configured form templates, wherein each form template is provided with a corresponding database table and comprises field information of at least one field of a form head in the corresponding database table;
s200: determining data corresponding to the field information from the form according to the field information in the target form template to obtain target data;
s300: and extracting the target data to a database table corresponding to the target form template.
In the embodiment of the invention, the execution main body of the data extraction method is electronic equipment. The electronic device may be, for example, a computer device or a server composed of a plurality of computer devices, and of course, the specific type of the electronic device is not limited thereto, and may have a certain data processing capability.
In the embodiment of the present invention, an ETL application program may be modified, or a new ETL application program may be developed, so that when the electronic device runs the ETL application program, the above-mentioned data extraction method may be implemented.
In step S100, a target form template matching the to-be-processed form is obtained from the configured form templates, each form template has a corresponding database table, and the form template includes field information of at least one field of a form header in the corresponding database table.
Different database tables can be used for storing different types of data, for example, one database table is used for storing meteorological data, another database table is used for storing vegetable data, and the like, and certainly, other database tables can be used for storing other types of data, such as personnel data, and the like. Each type of data may exist in a large number of files with different structures, such as Excel files, and therefore corresponding data needs to be extracted from Excel files with different structures.
A corresponding form template may be configured in advance for each database table, and the form template includes field information corresponding to at least one field of a form header in the database table, where the form header of the database table indicates fields and field order included in data to be stored in the database table. Each database table may correspond to one or more form templates, where, for example, in the case of multiple form templates, the form header of one database table contains 5 fields, and the database table may store data containing 5 fields and also 4 fields, and may be configured with a form template containing field information of 5 fields and a form template containing field information of 4 fields.
According to data storage requirements (such as fields of data to be stored), required field information can be automatically acquired from a file storing certain type of data, such as an Excel file storing vegetable data, and a corresponding form template is generated according to the acquired field information. Alternatively, the required form template may be generated by the user in a manually configured manner.
The form template may mainly include several important attributes such as set matching degree probability, maximum traversal number maxtravereseTimes, field information Columns, and the like, where the field information Columns may include a field Type, a field Name, a time Format, a field Length, a field Precision, and the like, which may be referred to as table (1) below, and of course, the form template is not limited thereto.
In one example, a user can sort out fields of data to be stored in a certain library table according to the library table, configure corresponding field information, fill the field information into an ETL program page, and then automatically generate a JSON-format form template as a form template corresponding to the library table. Taking the field information Columns in the table (1) as an example, in the form template, the field information Columns can be used as the parent parameters of the field Type, the field Name, the time Format, the field Length, and the field Precision.
When a target form template matching a to-be-processed form is obtained from a configured form template, a form header of the form may be determined (which may be pre-specified or determined according to the form template), and a matched target form template may be determined from the configured form template according to field information of fields in the form header, for example, fields corresponding to all or most of field information in the target form template exist in the form header of the form. Of course, it is not particularly limited thereto, and will be described in more detail in the following.
In one embodiment, the method further comprises: when any database table needs to be configured with the form template, whether the corresponding relation between the database table and the form template needing to be configured exists in a specified database is checked, if the corresponding relation does not exist, the configuration of the form template is continued, and the corresponding relation between the database table and the form template is stored in the specified database.
The designated database may be any designated database, and is not limited specifically.
Storing the configured corresponding relation between the database table and the form template in a designated database, and when a corresponding form template needs to be configured for a certain database table subsequently, only checking whether the corresponding relation between the database table and the form template exists in the designated database, if not, indicating that the corresponding form template has not been configured for the database table before, then configuring the form template, and storing the corresponding relation between the database table and the form template in the designated database.
Optionally, if the specified database has a corresponding relationship between the database table and the form template to be configured, repeated configuration is not required, and the form template may be directly called subsequently. In this manner, duplicate configurations of the same form template may be avoided.
In one embodiment, obtaining, from the configured form templates, a target form template matching the form to be processed may further be: under the condition of obtaining the data category stored in the form, a target form template matched with the form to be processed can be obtained from the form templates configured in the database table for storing the data. In other words, the configured form template subsequently involved is the form template configured for the database table used to store that type of data.
For example, if the form stores vegetable data, a target form template matching the form to be processed is obtained from the form templates configured in the database table for storing vegetable data.
In one embodiment, the form is any one of the to-be-processed files, the format of the to-be-processed file is a specified file format, and the to-be-processed file contains at least one form.
The designated file format may be, for example, xls format or xlsx format, and accordingly, the file to be processed is an Excel file, one Excel file may include one or more sheets, and the form may be any one of the Excel files.
Optionally, the electronic device may find, from a folder in which various types of files are stored, an Excel file whose file name matches the regular expression through the specified regular expression, read one or more Excel files each time (the number of specific reads may be specified, and the sequence of the specific reads is not limited), and traverse the Sheet in the read Excel file, where the Sheet traversed each time is used as a to-be-processed form. In this example, the regular expression may be, for example, "× xlsx", which means that all Excel files suffixed with xlsx are read, but this is only an example and is not particularly limited thereto.
In step S200, data corresponding to the field information is determined from the form according to the field information in the target form template, so as to obtain target data.
The field information may include a field name, and data corresponding to the field name may be determined from the form to obtain the target data. For example, each determined target data contains the field names in the target form template.
For example, the target form template may include field names such as 3 fields, including weather, temperature, and statistical time; the form header of the form also contains three fields of weather, temperature and statistical time, and also contains other two independent fields a and b, and accordingly, each piece of data contains these fields, and specifically, the form can be referred to as the following table (2):
for example, the following three pieces of target data may be determined:
sunny, 21.5, 2020-05-23;
rain, 18, 2020-05-24;
fine, 28, 2020-05-25.
In step S300, the target data is extracted into a database table corresponding to the target form template.
After all the target data are determined, all the target data are extracted into a database table corresponding to the target form template; or, each time one piece of target data is determined, the target data may be extracted into a database table corresponding to the target form template, which is not limited specifically.
After the data is determined, the data can be converted to a certain extent according to a set conversion rule and then stored in a database table corresponding to the target form template. For example, the fields in the data may be sorted according to the field sequence of the form header in the database table, and the sorted data may be extracted into the database table, which is only an example here, and the conversion rule is not specifically limited to this. The conversion rules may be pre-set in the form template.
In the embodiment of the invention, form templates corresponding to a plurality of database tables are configured in advance, each form template comprises field information corresponding to at least one field of a form header in the database table, when data in a form needs to be extracted, a target form template matched with the form can be obtained from a configured form template, data corresponding to field information is determined from the form based on the field information in the target form template, target data is obtained, and the target data is extracted into a database table corresponding to the target form template, for forms with different structures, under the condition of matching the same form template, the data extraction can be completed based on the same form template, the form format does not need to be identified manually, a corresponding extraction mode does not need to be configured for each form, and the workload and the time required by user configuration can be greatly reduced.
For example, another table is shown in table (3) below:
comparing the table (2) with the table (3), the structures of the two are different, specifically, the order of the 'weather' and the 'irrelevant field a' in the two is changed, however, the fields 'weather', 'temperature', 'statistical time' to be extracted in the two are the same, the fields can be matched with the same form template, the data extraction is completed based on the same form template, the matching process is also automatic, the formats of the table (2) and the table (3) do not need to be recognized respectively, and the corresponding extraction modes do not need to be set for the table (2) and the table (3) respectively.
In one embodiment, in step S100, obtaining a target form template matching the to-be-processed form from the configured form templates may include the following steps:
traversing each row of cells in the form:
searching a form template containing field information matched with the content of the row of cells in the configured form template;
if the form template is found, determining the cell of the line as the form head of the form, and determining the found form template as the target form template.
Most often, the first line of cells in a form is the form header, but there are other situations, such as the first line of cells of a form being the form header, etc.
In this embodiment, it is determined in a row unit whether field information included in the form template matches the content of the row of cells, and in the case of matching, on one hand, it may be determined that the form template is a target form template, and on the other hand, it may also be determined that the row of cells is a form header, so that an additional step for determining the form header may be omitted.
In this embodiment, each row of cells in the form may be read in a traversal manner, so as to implement the above steps. That is, for each form template, each line of cells in the form may be traversed, whether field information in the form template matches the content of the traversed line of cells is checked, a matching rule is not limited, for example, field names included in all field information in the form template may exist in the line of cells, if the field names match, the traversed line of cells may be determined to be the form header of the form, and the found form template may be determined to be the target form template.
Optionally, the number of form headers of the form is 1, and the method further includes: and when the traversed row of cells is determined to be the form head of the form, ending the traversal of the form. Since the target form template has been found and the form header is also determined, traversal of subsequent lines is not required, and traversal can be terminated to improve processing efficiency.
Of course, if the number of form headers of the form is greater than 1, then upon determining that the traversed row of cells is a form header of the form: the form can continue to be traversed; or checking whether the current traversal times of the form reach the maximum traversal times, if so, ending the traversal of the form, otherwise, continuing the traversal of the form.
Optionally, the method further comprises: if the form template which contains the field information matched with the content of the row of cells is not found in the configured form template, checking whether the current traversal times of the form reach the maximum traversal times, and if so, ending the traversal of the form. Generally, after traversing for many times, whether the forms are matched or not cannot be determined, which indicates that the forms are unmatched with high probability, the traversal times of the forms are limited by defining the maximum traversal times, and the traversal is finished timely, so that resource waste caused by excessive traversal times can be avoided.
The maximum traversal times may be defined in the form template, the maximum traversal times in different form templates may be different, and when it is checked whether the current traversal times for the form reaches the maximum traversal times, it may be specifically checked whether the current traversal times for the form reaches the maximum traversal times in the form template. Of course, the maximum number of traversal times may also be defined in the electronic device, and is not limited herein.
In one embodiment, the step of searching the configured form template for a form template containing field information matching the content of the row of cells further comprises:
and when the data type of each cell in the line is a text type, searching a form template containing field information matched with the content of the cell in the line in the configured form template.
Generally, the type of each unit in the form header in the form is a text type, so based on this characteristic, when the data type of each unit in the line is a text type, a form template containing field information matched with the content of the unit in the line is searched in the configured form template; and when the data type of at least one cell in the line is a non-text type, the processing of searching the matched form template for the cell in the line can be directly skipped, so that the processing efficiency is improved.
In one embodiment, the field information includes at least a field name;
searching the configured form template for the form template with the field information matched with the content of the row of cells, wherein the method comprises the following steps:
for each of the configured form templates:
determining a reference cell from the row of cells, wherein the content of the reference cell is matched with any field name in the form template;
and determining whether the form template is a form template with the contained field information matched with the content of the row of cells or not according to the number of the reference cells.
There may be multiple cells in a row of the form, and the greater the number of cells (i.e., reference cells) whose contents match the field names in the form template in a row, the greater the probability that the cell in the row is the form header, so that it can be determined whether the form template is a form template whose included field information matches the contents of the cell in the row according to the number of reference cells.
For example, when the number of the reference cells reaches the set number, the form template may be determined to be a form template in which the included field information matches the content of the line of cells. The set number here may be defined in the form template, or may be preset in the electronic device, and a value of the set number may be determined as needed, for example, may be greater than 1.
Optionally, for each row of cells, when determining each reference cell, the content and the location index of the reference cell may be recorded in the cache. Optionally, the location index may be used to indicate a column where the reference cell is located, and taking table (2) as an example, the following information may be stored in the cache:
"weather": c2
"temperature": c4
"statistical time": c5
Wherein, C2 indicates that the cell in which "weather" is located in column 2, C4 indicates that the cell in which "temperature" is located in column 3, and C5 indicates that the cell in which "statistical time" is located in column 5, where C2, C4, and C5 may be represented by numerical values, and specific numerical values are not limited as long as the corresponding columns can be characterized.
Further, if it is determined that the form template is not a form template containing field information matching the content of the line of cells according to the number of the reference cells, the cache may be emptied.
In one embodiment, the matching of the content of the reference cell and any field name in the form template refers to:
the content of the reference cell is the same as the name of any field in the form template;
or the content of the reference cell and any field name in the form template are similar words or synonyms, and the similar words or synonyms are determined through a set matching algorithm.
In other words, the matching rule that the content of the reference cell matches any field name in the form template may include: the matching rules may be, for example, complete matching (i.e., the same) of the character strings, or similar or synonym matching of the character strings determined based on a set matching algorithm, and of course, the specific matching rules are not limited thereto, and may also be determined based on text similarity, for example.
The set matching algorithm may include, for example: examples of the word matching include NLP (Natural language processing) word matching, but are not limited to these.
In this embodiment, the matching of the near-synonyms or the synonyms is not an accurate matching method, so that the same form template can be used for data extraction of more data forms of the same type, the number of templates required to be configured can be further reduced, and the form head can be found more accurately.
In one embodiment, the form template further comprises setting a degree of match;
determining whether the form template is a form template with the contained field information matched with the content of the row of cells according to the number of the reference cells, wherein the method comprises the following steps:
calculating the ratio of the number of the reference cells to the number of the fields corresponding to the field information in the form template;
and if the ratio is greater than the set matching degree in the form template, determining that the form template is the form template with the contained field information matched with the content of the row of cells.
Taking the form as table (2) as an example, assuming that the fields corresponding to the field information in the form template are weather, temperature, and statistical time, that is, the number of the fields is 3, when the cells in the first row are traversed, 3 reference cells are found (the contents of the 3 reference cells are cached), the contents of the 3 reference cells are weather, temperature and statistical time respectively, and correspond to 3 field information in the form template, therefore, if the number of the reference cells is also 3, the ratio of the number of the reference cells to the number of the fields corresponding to the field information in the form template is calculated to be 3/3=1, and assuming that the set matching degree in the form template is 0.8, since 1 is greater than 0.8, the first row of cells may be determined to be the form header and the form template is one that contains field information that matches the contents of the row of cells.
In this embodiment, as long as the ratio is greater than the set matching degree, complete matching is not required, and thus one form template can match more forms, which can further reduce the number of templates required to be configured.
In one embodiment, in the event that no target form template matching the form exists in the configured form templates, relevant information for the form may be output to a specified file (such as, but not limited to, a txt file) or a specified database table for subsequent auditing.
The related information may include, for example, a file name of a file in which the form is located, and an error reason why the target form template is not obtained, where the error reason may include, for example, an error of field information in the form template (for example, a price is written as a value), and a format error of the file (for example, it is supposed that the html format is used, but the xlsx format is presented).
Audit personnel can judge whether the content in the form needs to be synchronized into the problem database table, modify the form template, modify the file format and the like by means of manual audit depending on the reason of error in the specified database table after the file is specified.
In one embodiment, the field information includes at least a field name;
the form template further comprises a field sequence corresponding to each field information and at least one conversion rule corresponding to the field information, wherein the field sequence is determined according to a form head of a database table corresponding to the target form template;
in step S200, determining data corresponding to the field information from the form according to the field information in the target form template to obtain target data, which may include the following steps:
s201: sequencing columns of target cells in the form according to the field sequence in the target form template, wherein the target cells belong to the form head of the form, and the content of the target cells is matched with any field name in the target form template;
s202: and determining the cell content of each row positioned behind the head of the form in the form to obtain a piece of target data.
When the form header is determined, the content of the cell recorded in the cache is the content of the target cell, and the matching rule for matching the content of the target cell with any field name in the target form template may refer to the matching rule for the content of the reference cell, which is not described herein again.
The columns of the target cells in the form can be determined according to the cell contents and the position indexes recorded in the cache, the sequence of fields in the target form template can be the same as the sequence of corresponding fields in the header of the form of the corresponding database table, the mapping relation between the fields in the target form template and the fields in the corresponding database table can be embodied, and after the columns of the target cells in the form are sorted based on the sequence of the fields, the corresponding fields in the form can be arranged according to the sequence of the fields in the database table.
Continuing to take the form as an example of table (2), assuming that the field ordering in the target form template is that the statistical time, weather and temperature are sequentially ordered from front to back, and the field order in the corresponding database table is also the same, after ordering the columns of the target cells in the form according to the field order, the following table (4) is obtained:
after sorting, determining the cell content of each row in the list where each target cell is located in the row according to each row in the form behind the head of the form, and obtaining a piece of target data.
For example, in the table (4), the head of the form is a first row, and the rows located behind the head of the form are a second row to a fourth row, data to be extracted are recorded in the rows, and by using a second row example, the cell contents located in the column where each target cell is located in the row are determined to be 2020-05-23, fine and 21.5, respectively, as one piece of target data.
Through the sequencing, the determined cell contents can be correctly recorded in the corresponding field positions in the database table, and the sequence consistency of the fields contained in each data in the database table is ensured.
Optionally, if a certain field (referred to as a default field for short) name in the target field template does not exist in the header of the table, that is, the number of extracted cell contents is less than the required number of fields, the position corresponding to the default field may be complemented with a null value, that is, the position corresponding to the default field in the target data may be a null value. For example, the field sequence in the target form template is that the statistical time, weather, temperature, and PM2.5 values are sequentially sorted from front to back, after 2020-05-23, sunny, and 21.5 are determined, a null value, such as 0, is complemented after 21.5, and the obtained target data is 2020-05-23, sunny, 21.5, and 0.
In one embodiment, before determining the cell contents in the column of the target cell in the row, the method further includes:
and if the merged cell exists in the row, splitting the merged cell, and filling the content of the merged cell into all split cells.
Each cell has corresponding attribute information, which may indicate whether the cell is a merged cell, and if a merged cell exists in the row, the merged cell is split, for example, split into the smallest cell, and the content of the merged cell is filled into all split cells.
For example, the table is as follows (5):
the first row is the head of the form, and if the merging cells are not split, the determined data are the following two data:
1. seven with 8 shifts, Zhang three, 202001
2. Li Si, 202002
Obviously, team information is lost in the data of lie four, resulting in data corruption.
In this embodiment, when traversing to the second row, the cell where the "seven-even-8 shifts" is detected is the merged cell, the merged cell is split, and the "seven-even-8 shifts" is filled into all the split cells, so as to obtain the following table (6):
through the splitting, the determined data are the following two data:
1. seven with 8 shifts, Zhang three, 202001
2. Seven with 8 shifts, Li four, 202002
Namely, the problem of data corruption is solved.
In one embodiment, after determining the cell contents in the column of the target cells in the row, the method further includes:
and converting the determined at least one cell content according to the conversion rule in the target form template, and taking the obtained cell content as a piece of target data.
The corresponding cell contents may be converted based on the field length, field precision, field type, and field format in the target form template. For example, for a cell whose field type is time type, the cell contents are converted with the predefined time format yyyy (year) -MM (month) -dd (day) in the target form template. Of course, this is merely an example, and other conversion methods are also possible.
Through the conversion mode, the format of the data output to the database table can be more uniform, and the further processing is more convenient.
The data extraction method of the present invention is described in a more specific embodiment with reference to fig. 2.
1) Obtaining an Excel file, searching the Excel file with the file name matched with the regular expression from various files through the specified regular expression, and then executing the step 2);
2) obtaining a form sheet in an Excel file, traversing all the sheet in the Excel file, wherein the obtained sheet is the traversed sheet, and then executing the step 3);
3) traversing the configured form template, and traversing each row of cells in the sheet aiming at the traversed form template; then executing step 4);
4) for each traversed cell in the row of cells, checking whether field information matched with the cell content exists in the form template, and if so, recording the cell content and the position index of the cell into a cache, and then executing step 5); if not, directly executing the step 5);
5) checking whether the ratio of the number of the cell contents in the cache to the number of the fields corresponding to the field information in the form template is greater than a set matching degree, if so, executing a step 7), and if not, executing a step 6);
6) checking whether the next cell exists in the row or not, and if so, continuing to process the next cell in the row; if not, continuing to traverse the form, when the form is traversed, continuing to traverse the next form template if the form template which is not traversed exists at present, otherwise, outputting an error result for indicating that the sheet is not matched with any form template, wherein the error result can comprise a file name, a sheet page name and an error reason, and returning to the step 2) to continue processing aiming at the next form;
7) determining that the current line is a form head and the form template is a target form template, and then executing step 8);
8) sequencing columns where target cell contents in the form are located according to the field sequence in the target form template, wherein the target cell contents are cell contents recorded in a cache, determining the positions of the columns according to corresponding position indexes, and then executing step 9);
9) traversing the lines behind the form head in the form, extracting target data from the traversed lines according to the field information in the target form template and outputting the target data to a corresponding database table.
The present invention also provides a data extraction apparatus, and referring to fig. 3, the data extraction apparatus 100 includes:
the target form template determining module 101 is configured to obtain a target form template matched with a to-be-processed form from configured form templates, where each form template has a corresponding database table and includes field information corresponding to at least one field of a form header in the database table;
the target data determining module 102 is configured to determine, according to field information in the target form template, data corresponding to the field information from the form to obtain target data;
and the data extraction module 103 is configured to extract the target data into a database table corresponding to the target form template.
In an embodiment, when the target form template determining module obtains a target form template matching a to-be-processed form from a configured form template, the target form template determining module is specifically configured to:
traversing each row of cells in the form:
searching a form template containing field information matched with the content of the row of cells in the configured form template;
if the form template is found, determining the cell of the line as the form head of the form, and determining the found form template as the target form template.
In one embodiment, the number of form headers of the form is 1; the target form template determination module is further to:
and when the line cell is determined to be the form head of the form, ending the traversal of the form.
In one embodiment, the target form template determination module is further to:
if not, checking whether the current traversal times of the form reaches the maximum traversal times, if so, ending the traversal of the form, otherwise, continuing the traversal of the form.
In one embodiment, when the target form template determining module finds a form template containing field information matching the content of the row of cells in the configured form template, the target form template determining module is further configured to:
and when the data type of each cell in the line is a text type, searching a form template containing field information matched with the content of the cell in the line in the configured form template.
In one embodiment, the field information includes at least a field name;
when the target form template determining module searches for a form template in which the included field information matches with the content of the row of cells in the configured form template, the target form template determining module is specifically configured to:
for each of the configured form templates:
determining a reference cell from the row of cells, wherein the content of the reference cell is matched with any field name in the form template;
and determining whether the form template is a form template with the contained field information matched with the content of the row of cells or not according to the number of the reference cells.
In one embodiment, the matching of the content of the reference cell and any field name in the form template refers to:
the content of the reference cell is the same as the name of any field in the form template;
or the content of the reference cell and any field name in the form template are similar words or synonyms, and the similar words or synonyms are determined through a set matching algorithm.
In one embodiment of the present invention,
the form template also comprises a set matching degree;
and when the target form template determining module determines whether the form template is a form template with the field information matched with the content of the row of cells according to the number of the reference cells, the target form template determining module is specifically configured to:
calculating the ratio of the number of the reference cells to the number of the fields corresponding to the field information in the form template;
and if the ratio is greater than the set matching degree in the form template, determining that the form template is the form template with the contained field information matched with the content of the row of cells.
In one embodiment of the present invention,
the field information at least comprises a field name;
the form template further comprises a field sequence corresponding to each field information and at least one conversion rule corresponding to the field information, wherein the field sequence is determined according to a form head of a database table corresponding to the target form template;
the target data determining module determines data corresponding to the field information from the form according to the field information in the target form template, and when obtaining the target data, the target data determining module is specifically configured to:
sequencing columns of target cells in the form according to the field sequence in the target form template, wherein the target cells belong to the form head of the form, and the content of the target cells is matched with any field name in the target form template;
and determining the cell content of each row positioned in the column of each target cell in the row as a piece of target data for each row positioned behind the head of the form in the form.
In one embodiment of the present invention,
the target data determination module is further configured to, before determining the cell content in the column of the target cell in the row, determine: if the merged cell exists in the row, splitting the merged cell, and filling the content of the merged cell into all split cells;
after the target data determination module determines the cell contents in the column of each target cell in the row, the target data determination module is further configured to: and converting the determined at least one cell content according to the conversion rule in the target form template, and taking the obtained cell content as a piece of target data.
In one embodiment, the apparatus further comprises:
and the configuration module is used for checking whether the corresponding relation between the database table and the form template to be configured exists in the specified database when the form template configuration needs to be carried out on any database table, if the corresponding relation does not exist, continuing the configuration of the form template, and storing the corresponding relation between the database table and the form template into the specified database.
In one embodiment, when the target form template determining module obtains a target form template matching the to-be-processed form from the configured form templates, the target form template determining module is further configured to:
and under the condition of obtaining the data category stored in the form, obtaining a target form template matched with the form to be processed from the form templates configured in the database table for storing the data.
In one embodiment, the form is any one of the to-be-processed files, the format of the to-be-processed file is a specified file format, and the to-be-processed file contains at least one form.
The implementation process of the functions and actions of each unit in the above device is specifically described in the implementation process of the corresponding step in the above method, and is not described herein again.
For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, wherein the units described as separate parts may or may not be physically separate, and the parts shown as units may or may not be physical units.
The invention also provides an electronic device, which comprises a processor and a memory; the memory stores a program that can be called by the processor; wherein, when the processor executes the program, the data extraction method as described in the foregoing embodiments is implemented.
The embodiment of the data extraction device can be applied to electronic equipment. Taking a software implementation as an example, as a logical device, the device is formed by reading, by a processor of the electronic device where the device is located, a corresponding computer program instruction in the nonvolatile memory into the memory for operation. From a hardware aspect, as shown in fig. 4, fig. 4 is a hardware structure diagram of an electronic device where the data extraction apparatus 100 is located according to an exemplary embodiment of the present invention, and except for the processor 510, the memory 530, the network interface 520, and the nonvolatile memory 540 shown in fig. 4, the electronic device where the data extraction apparatus 100 is located in the embodiment may also include other hardware according to an actual function of the electronic device, which is not described again.
The present invention also provides a machine-readable storage medium on which a program is stored, which when executed by a processor, implements the data extraction method as described in the foregoing embodiments.
The present invention may take the form of a computer program product embodied on one or more storage media including, but not limited to, disk storage, CD-ROM, optical storage, and the like, having program code embodied therein. Machine-readable storage media include both permanent and non-permanent, removable and non-removable media, and the storage of information may be accomplished by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of machine-readable storage media include, but are not limited to: phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium may be used to store information that may be accessed by a computing device.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.
Claims (16)
1. A data extraction method, comprising:
acquiring a target form template matched with a form to be processed from the configured form templates, wherein each form template is provided with a corresponding database table and comprises field information of at least one field of a form head in the corresponding database table;
determining data corresponding to the field information from the form according to the field information in the target form template to obtain target data;
and extracting the target data to a database table corresponding to the target form template.
2. The data extraction method of claim 1, wherein obtaining a target form template matching a form to be processed from the configured form templates comprises:
traversing each row of cells in the form:
searching a form template containing field information matched with the content of the row of cells in the configured form template;
if the form template is found, determining the cell of the line as the form head of the form, and determining the found form template as the target form template.
3. The data extraction method of claim 2, wherein the number of form headers of the form is 1; the method further comprises the following steps:
and when the line cell is determined to be the form head of the form, ending the traversal of the form.
4. The data extraction method of claim 2, further comprising:
if not, checking whether the current traversal times of the form reaches the maximum traversal times, if so, ending the traversal of the form, otherwise, continuing the traversal of the form.
5. The data extraction method of claim 2, wherein searching for a form template in the configured form template that contains field information that matches the content of the row of cells further comprises:
and when the data type of each cell in the line is a text type, searching a form template containing field information matched with the content of the cell in the line in the configured form template.
6. The data extraction method as claimed in claim 2,
the field information at least comprises a field name;
searching the configured form template for the form template with the field information matched with the content of the row of cells, wherein the method comprises the following steps:
for each of the configured form templates:
determining a reference cell from the row of cells, wherein the content of the reference cell is matched with any field name in the form template;
and determining whether the form template is a form template with the contained field information matched with the content of the row of cells or not according to the number of the reference cells.
7. The data extraction method of claim 6, wherein the matching of the content of the reference cell with any field name in the form template is:
the content of the reference cell is the same as the name of any field in the form template;
or the content of the reference cell and any field name in the form template are similar words or synonyms, and the similar words or synonyms are determined through a set matching algorithm.
8. The data extraction method of claim 6,
the form template also comprises a set matching degree;
determining whether the form template is a form template with the contained field information matched with the content of the row of cells according to the number of the reference cells, wherein the method comprises the following steps:
calculating the ratio of the number of the reference cells to the number of the fields corresponding to the field information in the form template;
and if the ratio is greater than the set matching degree in the form template, determining that the form template is the form template with the contained field information matched with the content of the row of cells.
9. The data extraction method as claimed in claim 2,
the field information at least comprises a field name;
the form template further comprises a field sequence corresponding to each field information and at least one conversion rule corresponding to the field information, wherein the field sequence is determined according to a form head of a database table corresponding to the target form template;
determining data corresponding to the field information from the form according to the field information in the target form template to obtain target data, wherein the method comprises the following steps:
sequencing columns of target cells in the form according to the field sequence in the target form template, wherein the target cells belong to the form head of the form, and the content of the target cells is matched with any field name in the target form template;
and determining the cell content of each row positioned in the column of each target cell in the row as a piece of target data for each row positioned behind the head of the form in the form.
10. The data extraction method as claimed in claim 9,
before determining the cell contents in the column of the target cells in the row, the method further includes: if the merged cell exists in the row, splitting the merged cell, and filling the content of the merged cell into all split cells;
after determining the cell contents in the column of the target cells in the row, the method further includes: and converting the determined at least one cell content according to the conversion rule in the target form template, and taking the obtained cell content as a piece of target data.
11. The data extraction method of claim 1, wherein the method further comprises:
when any database table needs to be configured with the form template, whether the corresponding relation between the database table and the form template needing to be configured exists in a specified database is checked, if the corresponding relation does not exist, the configuration of the form template is continued, and the corresponding relation between the database table and the form template is stored in the specified database.
12. The data extraction method of any one of claims 1-11, wherein obtaining a target form template from the configured form templates that matches the form to be processed further comprises:
and under the condition of obtaining the data category stored in the form, obtaining a target form template matched with the form to be processed from the form templates configured in the database table for storing the data.
13. The data extraction method according to claim 1, wherein the form is any one form in a file to be processed, the format of the file to be processed is a specified file format, and the file to be processed contains at least one form.
14. A data extraction apparatus, comprising:
the target form template determining module is used for acquiring a target form template matched with a form to be processed from the configured form templates, each form template is provided with a corresponding database table, and the form template comprises field information of at least one field of a form head in the corresponding database table;
the target data determining module is used for determining data corresponding to the field information from the form according to the field information in the target form template to obtain target data;
and the data extraction module is used for extracting the target data into a database table corresponding to the target form template.
15. An electronic device comprising a processor and a memory; the memory stores a program that can be called by the processor; wherein the processor, when executing the program, implements a data extraction method as claimed in any one of claims 1 to 13.
16. A machine-readable storage medium, having stored thereon a program which, when executed by a processor, implements a data extraction method as claimed in any one of claims 1 to 13.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010957895.3A CN111813849A (en) | 2020-09-14 | 2020-09-14 | Data extraction method, device and equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010957895.3A CN111813849A (en) | 2020-09-14 | 2020-09-14 | Data extraction method, device and equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111813849A true CN111813849A (en) | 2020-10-23 |
Family
ID=72859305
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010957895.3A Pending CN111813849A (en) | 2020-09-14 | 2020-09-14 | Data extraction method, device and equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111813849A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112596851A (en) * | 2020-12-02 | 2021-04-02 | 中国人民解放军63921部队 | Multi-source heterogeneous data batch extraction method and analysis method of simulation platform |
CN113127359A (en) * | 2021-04-23 | 2021-07-16 | 中国工商银行股份有限公司 | Method and device for obtaining test data |
CN113610396A (en) * | 2021-08-06 | 2021-11-05 | 三峡高科信息技术有限责任公司 | Method and system for structuring matrix designer based on construction quality acceptance table |
CN114155928A (en) * | 2021-12-14 | 2022-03-08 | 浙江太美医疗科技股份有限公司 | Form generation method and device, computer equipment and storage medium |
CN114385158A (en) * | 2021-12-30 | 2022-04-22 | 杭州数梦工场科技有限公司 | Data interaction system construction method, device and equipment |
CN115344571A (en) * | 2022-05-20 | 2022-11-15 | 药渡经纬信息科技(北京)有限公司 | Universal data acquisition and analysis method, system and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108345682A (en) * | 2018-03-02 | 2018-07-31 | 弘成科技发展有限公司 | Platform and method are imported and exported based on what multi-tenant can configure |
CN109933765A (en) * | 2019-03-12 | 2019-06-25 | 中冶焦耐(大连)工程技术有限公司 | A method of Excel table content is extracted to CAD table |
CN110321410A (en) * | 2019-06-21 | 2019-10-11 | 东软集团股份有限公司 | Method, apparatus, storage medium and the electronic equipment that log is extracted |
CN110399420A (en) * | 2019-07-30 | 2019-11-01 | 广州吉信网络科技开发有限公司 | A kind of deriving method, electronic equipment and the medium of configurableization Excel format |
CN111125221A (en) * | 2019-12-19 | 2020-05-08 | 上海三稻智能科技有限公司 | Excel format-based data extraction system and configuration method |
-
2020
- 2020-09-14 CN CN202010957895.3A patent/CN111813849A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108345682A (en) * | 2018-03-02 | 2018-07-31 | 弘成科技发展有限公司 | Platform and method are imported and exported based on what multi-tenant can configure |
CN109933765A (en) * | 2019-03-12 | 2019-06-25 | 中冶焦耐(大连)工程技术有限公司 | A method of Excel table content is extracted to CAD table |
CN110321410A (en) * | 2019-06-21 | 2019-10-11 | 东软集团股份有限公司 | Method, apparatus, storage medium and the electronic equipment that log is extracted |
CN110399420A (en) * | 2019-07-30 | 2019-11-01 | 广州吉信网络科技开发有限公司 | A kind of deriving method, electronic equipment and the medium of configurableization Excel format |
CN111125221A (en) * | 2019-12-19 | 2020-05-08 | 上海三稻智能科技有限公司 | Excel format-based data extraction system and configuration method |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112596851A (en) * | 2020-12-02 | 2021-04-02 | 中国人民解放军63921部队 | Multi-source heterogeneous data batch extraction method and analysis method of simulation platform |
CN113127359A (en) * | 2021-04-23 | 2021-07-16 | 中国工商银行股份有限公司 | Method and device for obtaining test data |
CN113610396A (en) * | 2021-08-06 | 2021-11-05 | 三峡高科信息技术有限责任公司 | Method and system for structuring matrix designer based on construction quality acceptance table |
CN113610396B (en) * | 2021-08-06 | 2022-02-11 | 三峡高科信息技术有限责任公司 | Method and system for structuring matrix designer based on construction quality acceptance table |
CN114155928A (en) * | 2021-12-14 | 2022-03-08 | 浙江太美医疗科技股份有限公司 | Form generation method and device, computer equipment and storage medium |
CN114385158A (en) * | 2021-12-30 | 2022-04-22 | 杭州数梦工场科技有限公司 | Data interaction system construction method, device and equipment |
CN115344571A (en) * | 2022-05-20 | 2022-11-15 | 药渡经纬信息科技(北京)有限公司 | Universal data acquisition and analysis method, system and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111813849A (en) | Data extraction method, device and equipment and storage medium | |
US11899641B2 (en) | Trie-based indices for databases | |
CN107038207B (en) | Data query method, data processing method and device | |
US20220342875A1 (en) | Data preparation context navigation | |
CN110019218B (en) | Data storage and query method and equipment | |
CN107491487B (en) | Full-text database architecture and bitmap index creation and data query method, server and medium | |
US8171029B2 (en) | Automatic generation of ontologies using word affinities | |
US8862566B2 (en) | Systems and methods for intelligent parallel searching | |
CN107016001A (en) | A kind of data query method and device | |
US20140046899A1 (en) | Method and Apparatus of Implementing Navigation of Product Properties | |
US11995059B2 (en) | Database index and database query processing method, apparatus, and device | |
CN110555035A (en) | Method and device for optimizing query statement | |
CN111125199B (en) | Database access method and device and electronic equipment | |
CN115658680A (en) | Data storage method, data query method and related device | |
CN116561181A (en) | Data query method, device, computer equipment and computer readable storage medium | |
CN116610700A (en) | Query statement detection method and device and storage medium | |
US11868362B1 (en) | Metadata extraction from big data sources | |
CN115712757A (en) | Enterprise name matching method and device based on index tree | |
CN112214494B (en) | Retrieval method and device | |
CN114218347A (en) | Method for quickly searching index of multiple file contents | |
CN108197321B (en) | File memory method and system | |
CN113821691A (en) | Document processing method and device, electronic equipment and readable storage medium | |
CN109241098B (en) | Query optimization method for distributed database | |
CN113779200A (en) | Target industry word stock generation method, processor and device | |
US10387466B1 (en) | Window queries for large unstructured data sets |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |