CN112214554A - Excel data integration analysis system and method - Google Patents

Excel data integration analysis system and method Download PDF

Info

Publication number
CN112214554A
CN112214554A CN201910620677.8A CN201910620677A CN112214554A CN 112214554 A CN112214554 A CN 112214554A CN 201910620677 A CN201910620677 A CN 201910620677A CN 112214554 A CN112214554 A CN 112214554A
Authority
CN
China
Prior art keywords
data
resource
module
information
configuration
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910620677.8A
Other languages
Chinese (zh)
Inventor
尚林林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chen Rui Corp
Original Assignee
Chen Rui Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chen Rui Corp filed Critical Chen Rui Corp
Priority to CN201910620677.8A priority Critical patent/CN112214554A/en
Publication of CN112214554A publication Critical patent/CN112214554A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases

Abstract

The invention provides an excel data integration analysis system and a method, wherein in the system, an application system for carrying out data processing on excel data accessed into the system comprises a resource table definition module, a file rule configuration module, an import task monitoring module and a comparison rule configuration module; the display system for user-defined data retrieval and data comparison processing comprises a resource data retrieval module and a resource data comparison module; the data system for storing excel data of the access system, task records of system processing data and rule configuration set by the system comprises a system resource library, a system configuration library and a system rule library. The excel data is flexibly configured and defined through an application system, so that the accessed and processed excel data content is wider, the operation is convenient and fast, and the practicability is strong; the system also has a data flexible collision analysis function, can further discover deep invisible relations among data, and is favorable for flexible data collision analysis and data relation mining.

Description

Excel data integration analysis system and method
Technical Field
The invention relates to an analysis system for data processing, retrieval and comparison, in particular to an analysis system and method for self-defined retrieval and flexible collision comparison after Excel data are integrated.
Background
With the rapid development of information technology and the comprehensive popularization of office automation business, a large amount of Excel data is accumulated in each industry and different business units, and the Excel data is stored dispersedly and independently. How to realize effective integration and processing of the excel data, and efficient retrieval and flexible analysis of the processed data, so as to discover the implicit relationship among the data and the value of the deep level inside the data, is a problem which is urgently needed to be solved at present.
In order to solve the problems, various Excel data analysis systems, such as an Excel data management system (mainly achieving Excel data classification uploading management), an Excel data access system (mainly achieving Excel data classification storage), an Excel data retrieval system (mainly achieving integration and storage of Excel data and searching according to keywords), and the like, appear in the market. The main principle of the system is based on excle data classification management and storage, and data after integration is retrieved according to conditions; the finally acquired data is one or more independent files or independently stored data, and the association relationship among file contents is difficult to discover, and the data association relationship among multiple Excel files is difficult to discover. Therefore, the Excel data application analysis systems cannot well meet the actual requirements of services, cannot acquire the relevance among Excel data, and are difficult to find the hidden deep-level relevance among multiple Excel data.
The prior art also discloses the content of data flexible alignment, such as patent CN 104731814A. However, the data source for comparison in patent CN104731814A is an arbitrary relational database, and it is necessary to first store the relevant data conforming to the set format in the database and then perform comparison; the comparison result can be used as the input of the next comparison, the deep-level cyclic comparison of data is realized, and the comparison result can be checked, and early warning prompt and specific gravity result tracing can also be performed. The data comparison method disclosed in the patent requires a large amount of occupied database storage space and a large amount of manual data storage work, and cannot compare original data files, particularly excel data, so that flexible comparison of the excel data cannot be realized.
Due to the problems, the inventor carries out deep research and analysis on the existing Excel data processing and data comparison peer-to-peer related technologies, so as to develop an Excel data integration analysis system and method which can simply, conveniently and quickly access batch Excel file data, carry out custom data retrieval according to requirements and flexibly set a data comparison model to carry out collision analysis on processed data, and further discover a deep-level association relationship of the data.
Disclosure of Invention
In order to solve the above problems, the present inventors have conducted intensive studies and, as a result, have found that: in an analysis system comprising an application system, a presentation system and a data system, a resource table definition module in the application system is used for configuring a resource data table and a data field of an excel data file, and a file rule configuration module is used for flexibly defining data items of the excel data file, so that flexible access of different content data files and flexible configuration of the data items are facilitated; carrying out relational mapping on the data fields and the data items through the import task configuration module; the import task monitoring module monitors and manages the configured import task execution process; the comparison rule configuration module flexibly configures collision fields of any two resource tables; the data resource retrieval module realizes the self-defined retrieval of the resource table and accurately acquires data resources; the data resource comparison module can realize flexible collision analysis of data, further discover the association relation between the data and provide more valuable data information for business analysis; flexible comparison of data and checking of comparison results are achieved through excel data based on the imported resource data sheet, and therefore the method is completed.
The object of the present invention is to provide the following:
in a first aspect, the present invention provides an excel data integration and analysis system, including:
the application system 2 is used for processing data of excel data of the access system and comprises a resource table definition module 21, a file rule configuration module 22, an import task configuration module 23, an import task monitoring module 24 and a comparison rule configuration module 25;
the display system 1 is used for user-defined data retrieval and data comparison processing and comprises a resource data retrieval module 11 and a resource data comparison module 12;
the data system 3 is used for storing excel data of the access system, task records of system processing data and rule configuration set by the system, and comprises a system resource library 31, a system configuration library 32 and a system rule library 33.
In the application system 2, the first and second application systems,
the resource table definition module 21 is used for configuring a resource data table and a data field for the excel data accessed to the integrated analysis system, and setting the use attribute of the data field;
the file rule configuration module 22 is used for configuring data items for excel data accessed to the integrated analysis system; the data items comprise data content and position information under data header or data classification in excel data;
the import task configuration module 23 is configured to perform relational mapping on the data field configuration information transmitted by the resource table definition module 21 and the data item information transmitted by the file rule configuration module 22, and transmit the mapping relation as an import data task to the import task monitoring module 24;
the import task monitoring module 24 is configured to extract data item information including content information and location information in the excel data according to the import data task transmitted by the import task configuration module 23, and associate the data item with a data field in the resource data table according to a mapping relationship in the import data task;
and a comparison rule configuration module 25, configured to receive the resource data table and the data field information transmitted by the resource table definition module 21, and set data field comparison information between different resource data tables and data fields based on the data content and the service requirement.
In a second aspect, the present invention provides a universal excel data integration analysis method, preferably performed by using the above integration analysis system, the method comprising the steps of:
s1, configuring a resource table, configuring a resource data table and a data field for excel data needing to be accessed into the system, setting the use attribute of the data field, and storing configuration information;
s2, configuring file rules, configuring data items for excel data needing to be accessed to the system, and storing the data items;
s3, importing task configuration, and performing relational mapping and storage on configured data fields and data items;
s4, importing task monitoring, extracting and associating data items and data fields according to the mapping relation between the data fields and the data items, and storing association results;
s5, configuring comparison rules, randomly selecting two resource tables configured in S1, customizing data field comparison information between the resource tables according to business requirements, and storing the comparison information;
s6, the result data analysis application, according to the resource data generated in S4, can perform data custom search based on the resource table configured in S1 and/or perform data collision analysis based on the data field comparison information configured in S5.
The Excel data integration analysis system and method provided by the invention have the following beneficial effects:
(1) according to the excel data integration analysis system and method provided by the invention, by setting the excel file access rule, the access of various content item processing rules can be flexibly defined, so that the accessed and processed excel data content is wider, the operation is more convenient and faster, and the practicability of the system is improved;
(2) according to the Excel data integration analysis system and method provided by the invention, the entity table is configured based on the accessed Excel data, and the entity table is taken as a medium, so that the data mapping relation can be flexibly configured, the data mapping modes are various in the Excel data processing process, and the flexibility of the system and the subjective initiative of people are improved;
(3) according to the excel data integration analysis system and method provided by the invention, by arranging the task monitoring module, the execution mode of the data processing task can be defined in multiple modes, and the execution state and the execution progress of the data processing task can be monitored in real time;
(4) according to the excel data integration analysis system and method provided by the invention, by setting the use attribute of the data field, the data content can be directly and flexibly displayed and the user-defined data condition retrieval can be carried out;
(5) the excel data integration analysis system and method provided by the invention have a flexible data collision analysis function, can further discover deep invisible relations among data, and are convenient for a user to carry out flexible data collision analysis and data relation mining.
(6) The excel data integration and analysis system and method provided by the invention can realize flexible collection and task monitoring of excel files with different contents according to requirements, and flexible query and collision comparison of collected results.
Drawings
FIG. 1 is a schematic structural diagram of an excel data integration and analysis system according to a preferred embodiment of the invention;
FIG. 2 is a schematic diagram illustrating data flow between modules in an application architecture and a presentation architecture in accordance with a preferred embodiment of the present invention;
FIG. 3 illustrates an application architecture respective module data representation intention according to a preferred embodiment of the present invention;
FIG. 4 is a flow diagram illustrating the functional architecture of an import task management module in accordance with a preferred embodiment of the present invention;
FIG. 5 is a schematic flow chart diagram of an excel data integration analysis method according to a preferred embodiment of the invention.
The reference numbers illustrate:
1-exhibition system
11-resource data retrieval module
12-resource data comparison module
2-application System
21-resource table definition module
22-File rule configuration Module
23-importing task configuration Module
24-import task monitoring module
25-alignment rule configuration module
3-data system
31-System resource library
32-system configuration library
33-system rule base
Detailed Description
The features and advantages of the present invention will become more apparent and appreciated from the following detailed description of the invention, as illustrated in the accompanying drawings.
The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.
The present invention is described in detail below.
In order to effectively develop large-scale excel data from the perspective of data storage, management and data analysis, obtain effective integration among the data, discover the implicit relationship among the data and the value of the deep level inside the data, and show the data in a clear and quantitative manner, the invention provides an excel data integration and analysis system, as shown in fig. 1, the system comprises an application system 2, a showing system 1 and a data system 3; wherein the content of the first and second substances,
the application system 2 is used for carrying out data processing on excel data accessed to the system;
the display system 1 is used for user-defined data retrieval and data comparison processing;
the data system 3 is used for storing excel data of the access system, task records of system processing data and rule configuration set by the system.
Preferably, the presentation system 1 includes a resource data retrieval module 11 and a resource data comparison module 12;
the application system 2 comprises a resource table definition module 21, a file rule configuration module 22, an import task configuration module 23, an import task monitoring module 24 and a comparison rule configuration module 25;
the data architecture 3 includes a system resource library 31, a system configuration library 32 and a system rule library 33.
Further, in the application system 2,
the resource table definition module 21 is configured to configure a resource data table and data fields for the excel data accessed to the integrated analysis system, and set the usage attributes of the data fields. The resource table definition module 21 is linked with the system configuration library 32, stores the configuration information of the resource data table and the data field in the system configuration library 32, and transmits the configuration information to the resource data retrieval module 11, the import task configuration module 23, and the comparison rule configuration module 25.
Preferably, the resource data retrieval conditions and the retrieval results are flexibly set by reading the configuration information of the resource data table and the data fields in the system configuration library 32, the required excel file and the resource data table mapping rule are acquired in the execution process of the imported data task, and the required resource data table information is acquired when the comparison rule is configured.
In a preferred embodiment, the resource table defining module 21 includes:
the data table configuration submodule is used for configuring a resource data table for excel data accessed to the integrated analysis system;
preferably, the name and description information of the resource data table are configured according to the excel data content, and the configured information is stored in the system configuration library 32.
And the data field configuration sub-module is used for configuring data field information for the excel data accessed to the integrated analysis system and defining the use attribute of the data field when the data field is configured.
Preferably, data field information of the data resource table is configured according to the content item of the excel data needing to be imported, and attribute definition is carried out on the configured data field, wherein the field attribute comprises whether the data field is used as a retrieval condition or not and whether the data field is used as an output result or not. The content item refers to the header or the content in the category of the excel file.
Based on the difference of excel data files, especially for different batches of excel data files, the category of the excel data content may be different, and the category of the configuration resource data table, specifically, the name and the content (data field) may be different. Preferably, each resource data table is provided with a table identifier, which is a unique identifier or has a unique ID address in the system, so that confusion between different resource data tables does not occur.
The resource data table is named through a table name, and further, is described through table description, so that the category of the resource data table can be rapidly acquired.
The excel data is generally provided with a data header or data classification, corresponding data fields are set in the resource data sheet according to analysis requirements, and each data field contains data which is the same as the corresponding header or classification in the excel file.
The number of the data fields is set arbitrarily according to requirements, the data fields obtained by the data table header can be summarized, and the data fields obtained by the data table header can also be subdivided.
Preferably, the data field is provided with a field identifier which is a unique identifier or has a unique ID address in the system, and the data field in the resource data table is not confused.
And the data field is provided with a field name so as to carry out data field retrieval and data collision (comparison). Further, the data fields can be provided with field types including characters, dates and numbers, and the field types can be set to check during retrieval and data collision, so that the accuracy of the selected data fields is ensured.
And the data field is also provided with a use category so as to set the use attribute of the data field. The use attributes comprise retrieval condition display and result list display which are respectively represented and used for retrieval result display and data collision result display. The use attribute of the data field is set, so that the data content can be directly and flexibly displayed and the user-defined data field retrieval is facilitated.
And the data field is provided with a table identifier which is the same as the resource data table, so that the data field table and the resource data table have a corresponding relation. The data fields and resource data tables with correspondence should be derived from the same excel data.
In a preferred embodiment, as shown in fig. 3, the resource data table and the data field information generated by the resource table definition module 21 are stored in the resource data table and the resource field table, respectively. The resource data table comprises a table identifier, a table name and table description information; the table identifier is a primary key, namely, has a unique identifier; the resource field table comprises field identification, field name, field type, use category and table identification information; the field identification is a primary key, i.e. has a unique identification.
In the application system 2, the file rule configuration module 22 is used for configuring data items for excel data accessed to the integrated analysis system. The data items comprise data content and position information under data header or data classification in excel data. The file rule configuration module 22 is linked with the system rule base 33, stores the data item information in the system rule base 33, and transmits the configured data item information to the import task configuration module 23.
The configuration information of the system configuration library 32 is read, the mapping rule of the required excel file and the resource data table is acquired in the execution process of the imported data task, and the required resource table information is acquired by configuring and comparing the rule.
Preferably, the file rule configuration module 22 is configured to configure a file rule, wherein a unique identifier or an access rule identifier with a unique ID address is set, so that different excel data have different access rule identifiers, so that a plurality of excel data in the system are not mixed up.
In a preferred embodiment, the file rule configuration module 22 includes:
a rule name configuration submodule for configuring file rule names of the data items for system storage; further, rule descriptions are also configured for the data items, so that an operator can quickly identify and search relevant data item information in the system.
Preferably, similar to the data field, the configured data item is provided with a data item name and a data item type, so that verification is performed when retrieval and data collision occur, and the accuracy of the selected data item is ensured.
The file rule configuration module 22 further includes a data item configuration submodule, which is configured to configure data item content information and location information of excel data of the access system.
The header or classification of data is regularly input in the excel data and is used as a basic column of a data item, and the values in the excel cells comprise characters, dates and numbers and are used as content information of the basic column; and the position of the value in the excel cell comprises a row number and a column number in the excel file as the position information of the basic column content.
Preferably, a header or classification set separately in the excel data but applicable to the selected data for the elementary column is taken as the specified column of data items. Similarly, the designated column contains content information and position information corresponding to the corresponding data in the excel file.
For example, the table content rules of row numbers 3-7 in the following table take "name", "gender", "certificate number" and "seat number" as the basic columns of data items; the content and location information of the base column "name", e.g., lie four, is lie four, a 5.
And flight number and flight date are applicable to all persons in the basic column of the excel table, and the flight number and flight date are used as the designated columns of the data item, for example, the flight number designated column is B1.
Figure BDA0002125436240000101
In the excel data integration analysis system provided by the invention, the selection of the data items is based on the self-carried line number and column number of the excel data in the file, so that the system provided by the invention is suitable for excel data files with various data formats, and the universality of the system is improved.
Preferably, the file rule configuration module 22 is further configured with a file path and a use state, so as to search a storage location of an excel file accessed to the integrated analysis system and search a use state of the file.
In a preferred embodiment, as shown in fig. 3, the information generated by the file rule configuration module 22 is stored in the access file rule table, which includes access rule identifier, rule name, rule description, data item name, data item location, file path and usage status information; the access rule is identified as a primary key with a unique identification.
The file rule configuration module 22 can flexibly configure data item processing rules, so that the excel data contents accessed and processed are wider, the operation is more convenient and faster, and the practicability of the system is improved.
In the application system 2, the import task configuration module 23 is configured to perform relational mapping between the data field configuration information transmitted by the resource table definition module 21 and the data item information transmitted by the file rule configuration module 22. The import task configuration module 23 is linked with the system configuration library 32, stores the mapping relationship as an import data task in the system configuration library 32, and transmits the import data task to the import task monitoring module 24.
The relational mapping refers to searching information with corresponding relation in data field configuration information and data item information; the mapping relation refers to the corresponding relation between the data field configuration information and the data item information.
For example, a field name and a data item name are mapped, and a field certificate number and a data item certificate number are mapped.
Preferably, when the relational mapping is performed, the selection configuration is performed based on whether the content of the excel data item is consistent with the field description of the resource data table, so that the imported data item is ensured to be stored in the corresponding data field in the corresponding resource data table.
Preferably, the import task configuration module 23 is provided with a table identifier that is the same as the resource data table, and a one-to-one correspondence relationship is established between the table identifier with uniqueness and the resource data table that is transmitted by the resource table definition module 21 and matches the table identifier, so as to ensure the accuracy of the resource data table in the import data task.
Preferably, the import task configuration module 23 is further provided with a field identifier that is the same as the data field, and similarly, a one-to-one correspondence relationship is established between the unique field identifier and the field identifier of the matched data field in the resource data table, so as to ensure the accuracy of the data field in the import data task.
Preferably, the same data item name as that in the file rule configuration module 22 is also set in the import task configuration module 23, and similarly, a one-to-one correspondence relationship is established between the data item name and the data item name matched in the file rule configuration module 22, so as to ensure the accuracy of the data item in the import data task.
The import data task has different working states including start, stop, and end. Wherein the start indicates that relational mapping is started; the suspension represents the suspension of the relational mapping; ending means ending the relationship mapping process. And the working stage of the imported data task can be rapidly known according to the working state.
Preferably, a unique identifier or a task identifier with a unique ID address is set in the import task configuration module 23 to ensure uniqueness of the import data task in the system.
Further, an access rule identifier is set in the import task configuration module 23 for the import data task. And the access rule identification is used for determining a file rule for file analysis when the excel data access task is executed by the imported data task.
In a preferred embodiment, as shown in fig. 3, the information generated by the import task configuration module 23 is stored in an import task configuration table, which includes task identifiers, task names, field identifiers, data item names, table identifiers, access rule identifiers, and usage status information; the task identifier is a primary key of the unique identifier.
In the application system 2, the import task monitoring module 24 extracts data item information including content information and location information in excel data according to the import data task transmitted by the import task configuration module 23, and associates the data item with a data field in the resource data table according to a mapping relationship in the import data task. The import task monitoring module 24 is linked with the system resource library 31, and the associated data items and data fields are stored as import result data information in the system resource library 31 and transferred to the resource data retrieval module 11.
Preferably, the excel data item content of the data item and the data field is stored in the system resource library 31 by accessing the database.
Preferably, a unique identifier or a monitoring task identifier with a unique ID address is provided in the import task monitoring module 24 to ensure the uniqueness of the monitoring task monitoring the import data task in the system.
Preferably, the import task monitoring module 24 is provided with task identifiers identical to the import data tasks, and a one-to-one correspondence relationship is established between the task identifiers and the task identifiers matched in the import task configuration module 23, so as to ensure the accuracy of monitoring the monitoring tasks of the import data tasks.
Preferably, a task state is set in the import task monitoring module 24 to monitor a task state of the import data task, where the task state refers to whether the task is in a start or stop state, and includes task start, task execution, and task stop.
Preferably, the import task monitoring module 24 is provided with execution modes, including timing execution and instant execution; an execution state can be set, which means whether the task is in the processing process; an execution progress may also be set, which refers to the progress of the current data processing completion, such as completion of execution, in progress (data processing).
Preferably, a task processing start time and a task processing end time are set in the import task monitor module 24 to facilitate different task processing time requirements.
Through the above setting, the import task monitoring module 24 can flexibly set the execution mode of the processing task and can monitor the execution state and the execution progress of the processing task in real time.
In a preferred embodiment, as shown in fig. 3, the information generated by the import task monitoring module 24 is stored in an import task monitoring table, which includes a monitoring task identifier, a task status, an execution mode, an execution status, an execution progress, a task identifier, a task processing start time, and a task processing end time. And the monitoring task identifier is a primary key of the unique identifier.
In a preferred embodiment of the present invention, as shown in fig. 3, the import task monitoring module 24 completes monitoring of the excel data import data task by performing addition, modification and deletion processing on the "import task monitoring table".
In a preferred embodiment, the task monitoring flow of the import task monitoring module 24 is shown in fig. 4, and after a task is started, an operation is performed according to whether a command is executed; and starting data processing after the execution, and performing according to a timing processing command or an instant processing command after the execution is finished, wherein the timing processing command enables the task to be started at a set time, the instant processing enables the task to be immediately processed, and the task is stopped after the processing is finished.
And a comparison rule configuration module 25, which receives the resource data table and the data field information transmitted by the resource table definition module 21, and sets data field comparison information between different resource data tables and data fields based on the data content and the service requirement. The comparison rule configuration module 25 is linked with the system rule base 33, stores the data field comparison information in the system rule base 33, and transmits the data field comparison information to the resource data comparison module 12.
The data field comparison information comprises data filtering, data collision and data output. The data filtering refers to field retrieval of data fields of the resource data table; the data collision refers to associating data fields in different resource data tables with the same retrieval field; data output refers to outputting or transmitting the data collision result.
And based on the comparison target or requirement, customizing the data fields of the two or more different resource data tables to be compared, wherein the data fields comprise data filtering fields, data collision fields and data output fields.
For example, for an alignment between resource data tables a and B, the filter fields may be set as table a fields for name and certificate number, and table B for flight number and flight date; the data collision field can be set to be the same as the certificate number field in the table A and the table B; the data output field may be set to name, certificate number, flight number and flight date.
When data is collided, the set collision fields including field names and field types can be judged to be the same when the two collision fields participating in collision have the same or equal relation.
Through comparison of the comparison rule configuration module 25 to different data resource tables, the implicit association relation in different excel data and the deep-level association value of the data can be quickly searched.
Preferably, a comparison rule identifier is set in the data field comparison information, and is a unique identifier or has a unique ID address, so that a plurality of data comparison rules in the system are clear.
In a preferred embodiment, the comparison rule configuring module 25 includes a comparison rule name sub-module for configuring and storing the name of the data comparison rule;
the data filtering submodule is used for configuring retrieval fields when different resource data tables are aligned; preferably, when data is filtered, a filtering condition or a retrieval field is set in the compared resource data table respectively.
The data collision submodule is used for correlating data fields in the compared resource data table filtered by the data filtering submodule; preferably, when data collision occurs, collision conditions are respectively set in the compared resource data tables.
The data output submodule is used for configuring the output condition of the data collision result; preferably, output conditions are set in the compared resource data tables respectively, and more preferably, the data collision result is output or transmitted to the resource data comparison module 12.
In order to improve the accuracy of comparing the resource data tables, the comparison rule configuration module 25 is further provided with a table identifier corresponding to the resource data tables to be compared, and the required resource data tables can be accurately found in the system through the table identifier.
In a preferred embodiment, in the present invention, as shown in fig. 3, the information generated by the comparison rule configuration module 25 is stored in the comparison rule configuration table, which includes the comparison rule identifier, the comparison rule name, the table identifier, the filtering condition, the collision condition, and the output condition; and the comparison rule identifier is a primary key of the unique identifier.
In the application system 2, in a preferred embodiment, as shown by the arrows in fig. 3, the arrow connection indicates that there is an association relationship between two modules: the table identification in the resource data table is associated with the table identification in the import task configuration table and the table identification in the comparison rule configuration table; the task identification in the import task configuration table is associated with the task identification in the import task monitoring table; the field identification in the imported task configuration table is associated with the field identification in the resource field table; and the data item name and the access rule identifier in the imported task configuration table are respectively associated with the data item name and the access rule identifier in the access file rule table. The accuracy of calling the same resource data table, field or access rule in different modules is ensured through mutual correlation.
In a preferred embodiment of the excel data integration and analysis system provided by the present invention, the presentation system 1 further includes a resource data retrieval module 11, which is configured to receive the configuration information of the resource data table and the data field transmitted by the resource table definition module 21 and the import result data information transmitted by the import task monitoring module 24, and present the retrieval result based on a user-defined search term.
Preferably, the resource data retrieval module 11 visually displays the retrieval results in a list manner.
The display system 1 further includes a resource data comparison module 12, which is used for receiving and displaying the data field comparison information transmitted by the comparison rule configuration module 25.
Preferably, the resource data comparison module 12 visually displays the data comparison result in a list manner.
In a preferred embodiment, as shown in fig. 2, after the resource table defining module 21 performs resource table defining configuration and the file rule configuration module 22 performs excel file access rule configuration, the import task configuration module 23 performs import task mapping configuration, the mapping relationship is monitored by the import task monitoring module 24 to implement data processing, and the data processing result is transmitted to the resource data retrieval module 11 for resource data retrieval; the processing result transmitted to the resource data comparison module can also be compared with different resource tables under the configuration of the comparison rule set by the comparison rule configuration module 25. The resource table definition configuration file performed by the resource table definition module 21 can also be used for resource data retrieval alone.
In the excel data integration analysis system provided by the present invention, in a preferred embodiment, the data system 3 includes:
the system resource library 31 is used for storing excel data of all access systems;
the system configuration library 32 is used for storing data information generated by the configuration of each module of the system, including configuration information generated by the resource table definition module 21, the import task configuration module 23, and the import task monitoring module 24.
The system rule base 33 is used for storing the configuration information generated by the file rule configuration module 22 and the comparison rule configuration module 25.
The excel data integration analysis system provided by the invention configures the resource data sheet based on the accessed excel data, takes the resource data sheet as a medium, and can flexibly configure the data mapping relation, so that the excel data processing process has various data mapping modes, and the flexibility of the system and the subjective initiative of people are improved.
Another aspect of the present invention is to provide a universal excel data integration and analysis method, preferably implemented by the excel data integration and analysis system described above, where a flow of the universal excel data integration and analysis method is shown in fig. 5, and the method includes the following steps:
s1, configuring a resource table, configuring a resource data table and a data field for excel data needing to access the system, setting the use attribute of the data field, and storing configuration information.
Preferably, the resource table definition module 21 configures the resource data table and the data field, and sets the use attribute of the data field; more preferably, the configuration information is stored in a system configuration repository 32.
According to the set use attribute of the data field, the configuration resource data table and the data field can be used for resource data retrieval or resource data comparison.
S2, configuring file rules, configuring data items for excel data needing to be accessed to the system, and storing the data items;
preferably, the data item information is configured by the file rule configuration module 22; more preferably, the configured data item information is stored in the system rule base 33.
S3, importing task configuration, and performing relational mapping and storage on configured data fields and data items;
preferably, the mapping relationship between the data fields and the data items is configured by the import task configuration module 23; more preferably, the mapping relationship is stored in the system configuration library 32;
s4, importing task monitoring, extracting and associating data items and data fields according to the mapping relation between the data fields and the data items, and storing association results;
preferably, the mapping relationship between the data fields and the data items is extracted and associated by importing the task monitoring module 24; more preferably, the correlation result is stored in the system resource library 31 by the import task monitoring module 24;
s5, configuring comparison rules, randomly selecting two resource tables configured in S1, customizing data field comparison information between the resource tables according to business requirements, and storing the comparison information;
preferably, the comparison rule configuration module 25 sets the data field comparison information including data filtering, data collision and data output, and stores the comparison information into the system rule base 33;
s6, the result data analysis application, according to the resource data generated in S4, can perform data custom search based on the resource table configured in S1 and/or perform data collision analysis based on the data field comparison information configured in S5.
Preferably, the resource data retrieval module 11 is used for retrieving data, and the resource data comparison module 12 is used for performing data collision analysis.
According to the general excel data integration analysis system and method provided by the invention, flexible comparison of data and comparison result checking are realized through the excel data imported into the resource data sheet, the system has a flexible data collision analysis function, deep invisible relations among the data can be further found, and a user can conveniently carry out flexible data collision analysis and data relation mining work.
Examples
Example 1
1. Preparing an integrated excel data file as personnel information excel file content (file 1); flight information excel file content (file 2);
(a) personnel information excel file content (File 1)
Figure BDA0002125436240000191
(b) Flight information excel file content (File 2)
Figure BDA0002125436240000192
2. Configuring a resource data table and a data field for the file 1 and the file 2 to obtain a personnel information resource table (table 1) and a flight information resource table (table 2);
(c) personnel information resource table (Table 1) and data field content
Serial number Name (I) Sex Certificate number Mobile phone number Address
(d) Flight information resource table (Table 2) and data field content
Serial number Name (I) Sex Certificate number Seat number Flight number Date of flight
3. Configuring data items or performing access rule configuration on the files 1 and 2, and storing; specifically, the method comprises the following steps:
file 1 access rule configuration:
(a) rule name: personnel information access rules
(b) The rules describe: for accessing personal information
(c) Data item configuration: an initial behavior 2, a data item name and excel position A, a data item gender and excel position B, a data item certificate number and excel position C, a data item mobile phone number and excel position D, and a data item address and excel position E;
(d) file path: actual path of server storage for uploading files
(e) The file state: use of
File 2 access rule configuration:
(a) rule name: flight information access rules
(b) The rules describe: for accessing flight information
(c) Data item configuration: an initial action 4, a data item name and excel position A, a data item gender and excel position B, a data item certificate number and excel position C, a data item seat number and excel position D, a data item flight number and excel position B1, and a data item flight date and excel position B2;
(d) file path: actual path of server storage for uploading files
(e) The file state: use of
4. Performing relational mapping or import task configuration on the configured data fields and data items, and storing, specifically:
(1) and (3) configuration of a personnel information import task:
(a) task name: people information import
(b) And (3) mapping relation configuration: carrying out relational mapping on data fields in the personnel information resource table and data items of personnel information access rules, namely field names and data item names, field sexuality and data item sexuality, field certificate numbers and data item certificate numbers, field mobile phone numbers and data item mobile phone numbers, field addresses and data item addresses;
(c) the use state is as follows: use of
(2) And (3) configuration of the flight information import task:
(a) task name: flight information import
(b) And (3) mapping relation configuration: carrying out relational mapping on data fields of the flight information resource table and data items of the flight information access rules, namely field names and data item names, field sexes and data item sexes, field certificate numbers and data item certificate numbers, field seat numbers and data item seat numbers, field flight numbers and data item flight numbers, field flight dates and data item flight dates;
(c) the use state is as follows: use of
5. Importing task monitoring, importing task configuration according to personnel information and flight information, extracting excel data item content, associating and storing data items and data fields through a mapping relation of the imported task configuration; the generated processing result data information is stored in a resource table personnel information table (table 1) and a resource table flight information table (table 2);
(e) the result of importing the resource table personnel information is as follows
Figure BDA0002125436240000211
(f) The result of importing the flight information of the resource table is as follows
Figure BDA0002125436240000212
Figure BDA0002125436240000221
6. Comparing rule configuration, selecting configured personnel information (table 1) and flight information (table 2) of the resource table, and setting a comparison rule:
rule name: comparing personnel information with flight information
Data filtering rules: the fields of the table 1 are names and certificate numbers, and the fields of the table 2 are flight numbers and flight dates;
data collision rules: the certificate number of table 1 is equal to the certificate number of table 2;
data output rules: the name, gender, certificate number, mobile phone number, address of table 1, seat number, flight date of table 2;
7. comparing and analyzing the personnel information and the flight information, customizing and inputting filtering conditions, outputting the information that the male personnel take the flight as an output result, and generating a processing result that the information is the comparison result information (table 3) of the personnel and the flight, wherein the processing result comprises the following steps:
(g) personnel and flight comparison result information (Table 3)
Figure BDA0002125436240000222
Figure BDA0002125436240000231
The invention has been described in detail with reference to specific embodiments and illustrative examples, but the description is not intended to be construed in a limiting sense. Those skilled in the art will appreciate that various equivalent substitutions, modifications or improvements may be made to the technical solution of the present invention and its embodiments without departing from the spirit and scope of the present invention, which fall within the scope of the present invention. The scope of the invention is defined by the appended claims.

Claims (10)

1. An excel data integration analysis system, characterized in that, the system comprises:
the application system (2) is used for carrying out data processing on excel data of the access system and comprises a resource table definition module (21), a file rule configuration module (22), an import task configuration module (23), an import task monitoring module (24) and a comparison rule configuration module (25);
the display system (1) is used for user-defined data retrieval and data comparison processing and comprises a resource data retrieval module (11) and a resource data comparison module (12);
the data system (3) is used for storing excel data of the access system, task records of system processing data and rule configuration set by the system, and comprises a system resource library (31), a system configuration library (32) and a system rule library (33).
2. The data integration and analysis system of claim 1, wherein, in the application system (2),
the resource table definition module (21) is used for configuring a resource data table and a data field for excel data accessed to the integrated analysis system and setting the use attribute of the data field;
the file rule configuration module (22) is used for configuring data items for excel data accessed to the integration analysis system; the data items comprise data content and position information under data header or data classification in excel data;
the import task configuration module (23) is used for carrying out relational mapping on the data field configuration information transmitted by the resource table definition module (21) and the data item information transmitted by the file rule configuration module (22), and the mapping relation is transmitted to the import task monitoring module (24) as an import data task;
the import task monitoring module (24) is used for extracting data item information including content information and position information in excel data according to the import data task transmitted by the import task configuration module (23), and associating the data item with a data field in the resource data table according to a mapping relation in the import data task;
and the comparison rule configuration module (25) is used for receiving the resource data table and the data field information transmitted by the resource table definition module (21) and setting data field comparison information among different resource data tables and data fields based on data content and service requirements.
3. The data integration analysis system according to claim 1 or 2,
the resource table definition module (21) comprises:
the data table configuration submodule is used for configuring a resource data table for excel data accessed to the integrated analysis system;
the data field configuration submodule is used for configuring data field information for excel data accessed to the integrated analysis system and defining the use attribute of the data field when the data field is configured;
preferably, the file rule configuration module (22) comprises:
the rule name configuration submodule is used for configuring the file rule name and rule description of the data item so as to facilitate system storage and search;
the data item configuration submodule is used for configuring data item content information and position information of excel data of the access system;
preferably, the alignment rule configuration module (25) comprises,
a comparison rule name submodule for configuring and storing the name of the data comparison rule;
the data filtering submodule is used for configuring retrieval fields when different resource data tables are aligned; preferably, when data is filtered, filtering conditions or retrieval fields are respectively set in the compared resource data tables;
the data collision submodule is used for correlating data fields in the compared resource data table filtered by the data filtering submodule;
and the data output sub-module is used for configuring the output condition of the data collision result, and preferably outputting or transmitting the data collision result to the resource data comparison module 12.
4. The data integration analysis system according to claim 1 or 2, wherein the resource data table and the data field information generated by the resource table definition module (21) are stored in the resource data table and the resource field table, respectively;
preferably, the resource data table comprises a primary key table identifier, a table name and table description information; the resource field table comprises a primary key field identifier, a field name, a field type, a use category and table identifier information;
the information generated by the file rule configuration module (22) is stored in an access file rule table, and comprises a primary key access rule identification, a rule name, rule description, a data item name, a data item position, a file path and use state information;
the information generated by the import task configuration module (23) is stored in an import task configuration table, and comprises a main key task identifier, a task name, a field identifier, a data item name, a table identifier, an access rule identifier and use state information;
the information generated by the import task monitoring module (24) is stored in an import task monitoring table, and comprises a main key monitoring task identifier, a task state, an execution mode, an execution state, an execution progress, a task identifier, task processing starting time and task processing ending time;
the information generated by the comparison rule configuration module (25) is stored in a comparison rule configuration table, and comprises a primary key comparison rule identification, a comparison rule name, a table identification, a filtering condition, a collision condition and an output condition.
5. The data integration and analysis system of claim 1, wherein in the presentation system (1),
the resource data retrieval module (11) is used for receiving the configuration information of the resource data table and the data field transmitted by the resource table definition module (21) and the import result data information transmitted by the import task monitoring module 24, and displaying a retrieval result based on a self-defined retrieval word;
the resource data comparison module (12) is used for receiving and displaying the data field comparison information transmitted by the comparison rule configuration module (25).
6. The data integration analysis system of claim 1,
in the data system (3):
the system resource library (31) is used for storing excel data of all the access systems;
the system configuration library (32) is used for storing data information generated by the configuration of each module of the system, and comprises configuration information generated by a resource table definition module (21), an import task configuration module (23) and an import task monitoring module (24);
a system rule base (33) for storing the configuration information generated by the file rule configuration module (22) and the comparison rule configuration module (25).
7. The data integration analysis system of claim 1,
the resource table definition module (21) is linked with the system configuration library (32), stores the configuration information of the resource data table and the data field in the system configuration library (32), and transmits the configuration information to the resource data retrieval module (11), the import task configuration module (23) and the comparison rule configuration module (25);
the file rule configuration module (22) is linked with the system rule base (33), stores data item information in the system rule base (33), and transmits the configured data item information to the import task configuration module (23);
the import task configuration module (23) is linked with the system configuration library (32), stores the mapping relation as an import data task in the system configuration library (32), and transmits the import data task to the import task monitoring module (24);
the import task monitoring module (24) is linked with the system resource library (31), and the associated data items and data fields are stored in the system resource library (31) as import result data information and are transmitted to the resource data retrieval module (11);
the comparison rule configuration module (25) is linked with the system rule base (33), stores the data field comparison information in the system rule base (33), and transmits the data field comparison information to the resource data comparison module (12).
8. A universal excel data integration analysis method, preferably performed using the integration analysis system according to one of claims 1 to 7, comprising the steps of:
s1, configuring a resource table, configuring a resource data table and a data field for excel data needing to be accessed into the system, setting the use attribute of the data field, and storing configuration information;
s2, configuring file rules, configuring data items for excel data needing to be accessed to the system, and storing the data items;
s3, importing task configuration, and performing relational mapping and storage on configured data fields and data items;
s4, importing task monitoring, extracting and associating data items and data fields according to the mapping relation between the data fields and the data items, and storing association results;
s5, configuring comparison rules, randomly selecting two resource tables configured in S1, customizing data field comparison information between the resource tables according to business requirements, and storing the comparison information;
s6, the result data analysis application, according to the resource data generated in S4, can perform data custom search based on the resource table configured in S1 and/or perform data collision analysis based on the data field comparison information configured in S5.
9. The data integration analysis method according to claim 8,
in S1, configuring a resource data table and a data field through a resource table definition module (21), and setting the use attribute of the data field;
in S2, configuring data item information through a file rule configuration module (22);
in S3, a mapping relation between the data field and the data item is configured through the import task configuration module (23);
in S4, a mapping relation between the data fields and the data items is extracted and associated through the import task monitoring module (24);
in S5, the comparison rule configuration module (25) sets comparison information.
10. The data integration analysis method according to claim 8,
in S1, the resource table definition module (21) stores the resource data table and the configuration information of the data field into the system configuration library (32);
in S2, the file rule configuration module (22) stores the configured data item information into the system rule base (33);
in S3, the mapping relation between the data field and the data item is stored in a system configuration library (32) through an import task configuration module (23);
in S4, the mapping relation and the correlation result between the data field and the data item are stored in a system resource library (31) through an import task monitoring module (24);
in S5, data field comparison information including data filtering, data collision and data output is set through a comparison rule configuration module (25), and the comparison information is stored in a system rule base (33).
CN201910620677.8A 2019-07-10 2019-07-10 Excel data integration analysis system and method Pending CN112214554A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910620677.8A CN112214554A (en) 2019-07-10 2019-07-10 Excel data integration analysis system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910620677.8A CN112214554A (en) 2019-07-10 2019-07-10 Excel data integration analysis system and method

Publications (1)

Publication Number Publication Date
CN112214554A true CN112214554A (en) 2021-01-12

Family

ID=74047291

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910620677.8A Pending CN112214554A (en) 2019-07-10 2019-07-10 Excel data integration analysis system and method

Country Status (1)

Country Link
CN (1) CN112214554A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112988790A (en) * 2021-02-07 2021-06-18 潍坊北大青鸟华光照排有限公司 EXECL-based data intercommunication and data batch retrieval query method
CN115455035A (en) * 2022-11-10 2022-12-09 广州思迈特软件有限公司 Data query model construction method and computer-readable storage medium
CN116383669A (en) * 2023-03-18 2023-07-04 宝钢工程技术集团有限公司 Method and system for generating factory object position number identification through data
TWI811179B (en) * 2023-02-09 2023-08-01 國立中山大學 Method and system for providing editing of text mining workflow

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020184210A1 (en) * 1999-01-26 2002-12-05 Joseph M. Khan Universal information warehouse system and method
CN101923549A (en) * 2009-07-29 2010-12-22 北京航天理想科技有限公司 User-defined visual intelligent track clue analytical system and establishing method
CN104731814A (en) * 2013-12-23 2015-06-24 北京宸瑞科技有限公司 System and method for flexibly comparing and analyzing data
CN108694179A (en) * 2017-04-06 2018-10-23 北京宸瑞科技股份有限公司 Personage's view analysis system based on attribute extraction and method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020184210A1 (en) * 1999-01-26 2002-12-05 Joseph M. Khan Universal information warehouse system and method
CN101923549A (en) * 2009-07-29 2010-12-22 北京航天理想科技有限公司 User-defined visual intelligent track clue analytical system and establishing method
CN104731814A (en) * 2013-12-23 2015-06-24 北京宸瑞科技有限公司 System and method for flexibly comparing and analyzing data
CN108694179A (en) * 2017-04-06 2018-10-23 北京宸瑞科技股份有限公司 Personage's view analysis system based on attribute extraction and method

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112988790A (en) * 2021-02-07 2021-06-18 潍坊北大青鸟华光照排有限公司 EXECL-based data intercommunication and data batch retrieval query method
CN115455035A (en) * 2022-11-10 2022-12-09 广州思迈特软件有限公司 Data query model construction method and computer-readable storage medium
TWI811179B (en) * 2023-02-09 2023-08-01 國立中山大學 Method and system for providing editing of text mining workflow
CN116383669A (en) * 2023-03-18 2023-07-04 宝钢工程技术集团有限公司 Method and system for generating factory object position number identification through data
CN116383669B (en) * 2023-03-18 2024-04-16 宝钢工程技术集团有限公司 Method and system for generating factory object position number identification through data

Similar Documents

Publication Publication Date Title
CN112214554A (en) Excel data integration analysis system and method
US9256686B2 (en) Using a bloom filter in a web analytics application
CN106897285B (en) Data element extraction and analysis system and data element extraction and analysis method
CN107783950A (en) Package insert processing method and processing device
CN111414740B (en) Electronic experiment record book report generation system
Sabri et al. Semantic pattern-based retrieval of architectural floor plans with case-based and graph-based searching techniques and their evaluation and visualization
US20080147631A1 (en) Method and system for collecting and retrieving information from web sites
US8463763B2 (en) Method and tool for searching in several data sources for a selected community of users
Tešendić et al. Business intelligence in the service of libraries
US11954110B2 (en) Methods and apparatus for visualizing entity instance relationships in a database
CN107451280B (en) Data communication method and device and electronic equipment
CN107729330B (en) Method and apparatus for acquiring data set
CN111190965A (en) Text data-based ad hoc relationship analysis system and method
CN112508119A (en) Feature mining combination method, device, equipment and computer readable storage medium
CN114637866B (en) Information management method and device for digitalized new media
CN114860737B (en) Processing method, device, equipment and medium of teaching and research data
CN115510289A (en) Data cube configuration method and device, electronic equipment and storage medium
JP5738943B2 (en) Pivot analysis method using condition group
CN110781213B (en) Multi-source mass data correlation searching method and system with personnel as center
CN113849520A (en) Intelligent identification method and device of abnormal SQL (structured query language), electronic equipment and storage medium
CN113076473A (en) User data processing method and device, computer equipment and storage medium
CN112527813A (en) Data processing method and device of business system, electronic equipment and storage medium
CN111429987B (en) Index file-based searching method and device
CN111063443B (en) Medical entity retrieval method, medical entity retrieval device, computer equipment and storage medium
EP4361840A1 (en) Process mining repository for analyzing process data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination