CN114880385A - Method and device for accessing geological disaster data through automatic combined flow - Google Patents
Method and device for accessing geological disaster data through automatic combined flow Download PDFInfo
- Publication number
- CN114880385A CN114880385A CN202110848943.XA CN202110848943A CN114880385A CN 114880385 A CN114880385 A CN 114880385A CN 202110848943 A CN202110848943 A CN 202110848943A CN 114880385 A CN114880385 A CN 114880385A
- Authority
- CN
- China
- Prior art keywords
- modules
- data
- sub
- algorithm sub
- input object
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/254—Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/283—Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/602—Providing cryptographic facilities or services
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Health & Medical Sciences (AREA)
- Bioethics (AREA)
- General Health & Medical Sciences (AREA)
- Computer Hardware Design (AREA)
- Computer Security & Cryptography (AREA)
- Software Systems (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention relates to the technical field of networks, and provides a method and a device for accessing geological disaster data through an automatic combined process, wherein the method comprises the following steps: acquiring original data and final data, taking the original data and the final data as an input object and a theoretical output object, and importing the input object and the theoretical output object into an automatic combination flow system; the automatic combined flow system analyzes the input object and the theoretical output object and determines the logical relationship between the input object and the theoretical output object; screening out corresponding algorithm sub-modules according to the logical relationship, and feeding back the algorithm sub-modules as ETL flows to be confirmed to the operation terminal after the corresponding algorithm sub-modules are arranged and combined; the invention provides a method for accessing geological disaster data through an automatic combination process, which is improved on the basis of a mode that a traditional ETL tool manually builds a data access process, automatically screens corresponding algorithm sub-modules by analyzing the logical relationship between an input object and a theoretical output object, and saves the collocation mode of most complex logical processes.
Description
Technical Field
The invention relates to the technical field of networks, in particular to a method and a device for accessing geological disaster data through an automatic combined process.
Background
The data warehouse technology (Extract-Transform-Load, abbreviated as ETL) is used to describe the process of extracting (Extract), converting (Transform), and loading (Load) data from a source end to a destination end; the ETL can be used for extracting, cleaning and converting geological disaster data and then loading the geological disaster data to a data warehouse, and aims to integrate scattered, disordered and standard non-uniform data in the geological disaster data together and provide analysis basis for disaster early warning.
Most of the ETL tools provided in the market at present are universal data processing platforms with functions of data extraction (Extract), cleaning conversion (Transform) and loading (Load), and the ETL tools also integrate functions of data synchronization, data exchange and data integration, and can provide complete support for data integration-based application and daily data cleaning conversion work. The method has the advantages that the support for various common data sources such as various databases, message servers, text files, XML, Excel files, WebService, LDAP and the like is built in, and the visual flow designer supporting dragging is provided, so that the working efficiency can be greatly improved, but excessive energy and time are consumed in the process of designing the complex flow for accessing the geological disaster data.
In view of the above, overcoming the drawbacks of the prior art is an urgent problem in the art.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: the existing method for accessing geological disaster data is often too complex in flow, too many operation steps are needed for building a design flow, too much effort and time are needed to be consumed, and the working efficiency is reduced.
In order to achieve the purpose, the invention adopts the following technical scheme:
in a first aspect, the present invention provides a method for accessing geological disaster data through an automatic combination process, including:
acquiring original data which are sent by an operation terminal and are to be imported by a target ETL process expected by a user and final data after the ETL process, and importing the original data and the final data into an automatic combined process system as input objects and theoretical output objects respectively;
the automatic combined flow system analyzes the input object and the theoretical output object and determines the logical relationship between the input object and the theoretical output object;
wherein the logical relationship comprises: one or more of the relationship of the associated positions of the components obtained by splitting the theoretical output object in the input object, the relationship of the associated data belonging to the same type of attribute and contained in the input object by the corresponding components, or the relationship that the associated positions of the corresponding components cannot be directly found in the input object;
and screening out corresponding algorithm sub-modules according to the logical relationship, and feeding back the algorithm sub-modules as ETL flows to be confirmed to the operation terminal after the corresponding algorithm sub-modules are arranged and combined.
Preferably, in the process of permutation and combination, if there are two or at least two alternative algorithm sub-modules for each link of the ETL flow to be confirmed, the corresponding alternative algorithm sub-modules are presented in a manner of selection switching or in a manner of list listing, specifically including:
when the selection switching mode is presented, when one algorithm sub-module is selected and switched, presenting the performance attribute of the corresponding algorithm sub-module during working;
when the algorithm is presented in a list mode, all the selectable algorithm sub-modules and the performance attributes of each algorithm sub-module during working are presented in the list completely;
the performance attributes include a correspondence between the computing resources required by the respective algorithm sub-modules and the time required to compute the unit data volume.
Preferably, the computing resources include: one or more of the number of physical servers, the number of virtual machines, configuration parameters required by the physical servers, and configuration parameters owned by the virtual machines.
Preferably, the process of generating the ETL procedure to be confirmed includes:
splitting the theoretical output object to obtain constituent elements, extracting a first part of constituent elements of which the associated positions cannot be found in the input object, screening corresponding algorithm sub-modules according to the logical relationship between a second part of constituent elements of which the associated positions can be found in the input object and the input object, and arranging and combining the corresponding algorithm sub-modules according to the position relationship of the second part of constituent elements in the theoretical output object to obtain an initial arrangement combination;
feeding back the first part of the component elements and the original data to an operation terminal, and triggering a user to supplement the advanced logic relationship between the first part of the component elements and the original data; and the automatic combined flow system further screens out corresponding algorithm sub-modules according to the advanced logical relationship, and inserts the screened algorithm sub-modules into the initial arrangement combination according to the position relationship of the first component element in a theoretical output object to generate the ETL flow to be confirmed.
Preferably, the advanced logical relationship includes:
and/or calculating the relation of the first part of component elements by using a specified operation rule between the objects at the specified positions in the original data.
Preferably, the splitting the theoretical output object to obtain constituent elements specifically includes:
performing semantic analysis and/or key-value splitting with database characteristics on the theoretical output object to obtain component elements, and matching the component elements in the input object;
if the matching is successful, classifying the corresponding component elements into the second part of component elements;
if the matching fails, further splitting the corresponding component to obtain a minimum component, and if the minimum component split into single bytes still fails to be matched, classifying the corresponding minimum component into the first part of components; and if the matching of the minimum constituent elements obtained by one or at least two times of further splitting is successful, the corresponding minimum constituent elements are classified into the second part of constituent elements.
Preferably, the extracting the first part of the components for which the associated position cannot be found in the input object specifically includes:
and supplementing the relevant data which is obtained by analyzing and is contained in the input object by the corresponding component element and belongs to the same type of attribute into the input object, and extracting the first part of the component element of which the relevant position cannot be found in the supplemented input object.
Preferably, the screening process comprises:
respectively taking the specified composition elements and the associated context content in the matched input object as a theoretical output sub-object and an input sub-object, processing the input sub-object by various algorithm sub-modules to obtain various actual output sub-objects on corresponding sides, and if the actual output sub-object operated by one algorithm sub-module is consistent with the theoretical output sub-object, selecting the corresponding algorithm sub-module; otherwise, the corresponding algorithm sub-module is screened out.
Preferably, if an ETL flow case is stored in the automatic combination flow system, importing the original data into the automatic combination flow system, and traversing the stored ETL flow case to obtain a case output corresponding to each case; matching the final data with each case output one by one;
if the matching degree is higher than the preset proportion value, the successfully matched component elements in the final data follow the successfully matched algorithm sub-modules in the corresponding process cases, and the unmatched algorithm sub-modules in the corresponding process cases are removed; after further splitting the unmatched composition elements in the final data as theoretical output objects, screening corresponding algorithm sub-modules, and adding the screened corresponding algorithm sub-modules to the adaptive positions in the corresponding process cases to obtain the adjusted ETL process to be confirmed;
wherein the fitting position is determined according to the upstream and downstream positions of the unmatched component elements in the final data.
In a second aspect, the present invention provides an apparatus for accessing geological disaster data through an automatic combination process, which is used to implement the method for accessing geological disaster data through an automatic combination process in the first aspect, and the apparatus for accessing geological disaster data through an automatic combination process includes:
at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor for performing the method of accessing geological disaster data by an automated combinatorial process of the first aspect.
Compared with the prior art, the invention has the beneficial effects that:
in order to fully simplify the function of accessing the geological disaster data, the invention provides a method for accessing the geological disaster data through an automatic combination flow, which is improved on the basis of a mode that a traditional ETL tool manually establishes a geological disaster data access flow, and automatically screens corresponding algorithm submodules by analyzing the logic relation between an input object and a theoretical output object, so that most of matching modes of complex logic flows are saved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required to be used in the embodiments of the present invention will be briefly described below. It is obvious that the drawings described below are only some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.
FIG. 1 is a schematic flow chart diagram illustrating a method for accessing geological disaster data through an automated combination process according to an embodiment of the present invention;
FIG. 2 is a schematic diagram presented in a manner of selecting switching for a method for accessing geological disaster data through an automatic combination process according to an embodiment of the present invention;
FIG. 3 is a schematic diagram presented in a list form of a method for accessing geological disaster data through an automatic combination process according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a process for generating ETL to be confirmed according to a method for accessing geological disaster data through an automatic combination process, provided by an embodiment of the present invention;
fig. 5 is a schematic structural diagram of an apparatus for accessing geological disaster data through an automatic combination process according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
In the description of the present invention, the terms "inner", "outer", "longitudinal", "lateral", "upper", "lower", "top", "bottom", and the like indicate orientations or positional relationships based on those shown in the drawings, and are for convenience only to describe the present invention without requiring the present invention to be necessarily constructed and operated in a specific orientation, and thus should not be construed as limiting the present invention.
In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
Example 1:
the embodiment of the invention provides a method for accessing geological disaster data through an automatic combined process, which comprises the following steps of:
in step 201, original data to be imported by a target ETL process desired by a user and sent by an operation terminal and final data after processing the ETL process are obtained, and the original data and the final data are respectively used as an input object and a theoretical output object and imported into an automatic combination process system.
The original data and the final data are imported into an automatic combination process system as input objects and theoretical output objects, so as to obtain corresponding ETL processes, namely the original data are processed by the corresponding ETL processes to obtain the final data; the ETL process is one algorithm sub-module or a combination of a plurality of algorithm sub-modules.
In step 202, the automated composition flow system analyzes the input object and the theoretical output object and determines a logical relationship between the input object and the theoretical output object.
Wherein the logical relationship comprises: the relationship of the associated positions of the components obtained by splitting the theoretical output object in the input object, the relationship of the associated data belonging to the same type of attribute and contained in the input object by the corresponding components, or the relationship that the associated positions of the corresponding components cannot be directly found in the input object.
Wherein the associated data attributed to the same type of attribute may be understood as follows: assuming that the input object is a geological disaster data table and the same type of attribute is the number of different monitoring devices, the total number of the monitoring devices is the associated data belonging to the same type of attribute, and the associated data is not directly reflected in the input object but can be obtained through analysis.
The relation that the corresponding component element can not directly find the associated position in the input object is determined after the corresponding component element and the associated data are analyzed and the associated position can not be found in the input object; for such an ambiguous logical relationship, the user is required to supplement an intermediate relationship between the corresponding constituent elements and the input object through the operation terminal, and the intermediate relationship is obtained by introducing an encryption algorithm or specifying an operation rule.
In step 203, screening out corresponding algorithm sub-modules according to the logical relationship, and feeding back the algorithm sub-modules after permutation and combination as an ETL process to be confirmed to the operation terminal, specifically including: finding the associated context content in the input object matched with the specified component element, taking the associated context content in the input object matched with the specified component element as an input sub-object, processing the input sub-object by different algorithm sub-modules to obtain a corresponding actual output sub-object, finding the actual output sub-object completely matched with the specified component element, wherein the algorithm sub-module corresponding to the actual output sub-object completely matched with the specified component element is the algorithm sub-module corresponding to the specified component element obtained by screening; and continuously screening the algorithm submodules corresponding to other constituent elements, and arranging and combining the screened algorithm submodules according to the positions of the corresponding constituent elements in the final data to obtain the ETL flow to be confirmed and feeding back the ETL flow to the operation terminal.
In the embodiment of the present invention, in the process of permutation and combination, if there are two or at least two alternative algorithm sub-modules for each link of the ETL flow to be confirmed, the corresponding alternative algorithm sub-modules are presented in a manner of selecting switching or in a manner of list, which specifically includes:
as shown in fig. 2, when presented in a selective switching manner, when a selection is switched to one of the algorithm sub-modules, the performance attributes of the corresponding algorithm sub-module during operation are presented.
When presented in a list, all the algorithm sub-modules from which selection is made and the performance attributes of each algorithm sub-module when operating are presented in the list in its entirety, as shown in fig. 3.
The performance attributes include a correspondence between the computing resources required by the respective algorithm sub-modules and the time required to compute the unit data volume.
And the semi-automatic presentation shows that the automatic combined flow system presents all the selectable algorithm sub-modules and corresponding performance attributes to a user, and the user selects a proper algorithm sub-module in each link of the ETL flow presented by the automatic combined flow system according to the existing computing resources, the total amount of data to be processed and the expected computing time to obtain the final ETL flow.
The corresponding relationship between the links and the algorithm sub-modules can be that one algorithm sub-module covers a plurality of links, that is, one algorithm sub-module completes all processes, for example, ETL has at least three links, and one algorithm can cover three links; for example, the E link in the ETL may correspond to four selectable algorithm sub-modules, the T link corresponds to two selectable algorithm sub-modules, and the T link corresponds to three selectable algorithm sub-modules.
In order to further simplify the operation, the operation can be upgraded from semi-automatic presentation to full-automatic presentation, which specifically includes: inputting related parameters by a user, and automatically completing selection of an ETL flow by the automatic combined flow system according to the identified information and/or the parameters input by the user; the relevant parameters include: it is desirable to calculate one or more of time, evaluation of own device, and total amount of data to be processed.
In an embodiment of the present invention, the computing resources include: one or more of the number of physical servers, the number of virtual machines, configuration parameters required by the physical servers and configuration parameters owned by the virtual machines; wherein the configuration parameters include a CPU and/or a memory.
In this embodiment of the present invention, as shown in fig. 4, the process of generating the ETL flow to be confirmed includes:
in step 301, splitting the theoretical output object to obtain component elements, where the component elements include a first part of component elements and a second part of component elements; the first part of components can not find the associated position in the input object, and the second part of components can directly or indirectly find the associated position in the input object; the fact that the associated position can be directly found means that the associated position can be found in the input object by the constituent elements obtained after the theoretical output object is split for the first time; the fact that the associated position can be indirectly found means that the associated position can be found in the input object by the constituent elements obtained after the theoretical output object is split at least twice.
In step 302, the first part of the components that can not find the associated position in the input object are extracted.
In step 303, screening out corresponding algorithm sub-modules according to a second part of components of which associated positions can be found in the input object and the logical relationship of the input object, and performing permutation and combination on the corresponding algorithm sub-modules according to the position relationship of the second part of components in the theoretical output object to obtain an initial permutation and combination.
In step 304, the first part of components and the original data are fed back to the operation terminal, and the user is triggered to supplement the advanced logical relationship between the first part of components and the original data.
In step 305, the automatic combined process system further screens out corresponding algorithm sub-modules according to the advanced logical relationship, and inserts the screened algorithm sub-modules into the initial arrangement combination according to the position relationship of the first component in the theoretical output object, so as to generate the ETL process to be confirmed.
In an embodiment of the present invention, the advanced logical relationship includes:
and/or calculating the relation of the first part of component elements by using a specified operation rule between the objects at the specified positions in the original data.
The advanced logic relationship is given when a user identifies the introduction route of the first part of the component elements, and if the first part of the component elements are introduced through an encryption algorithm, different algorithm submodules are linked through filling of a script language; and if the first part of component elements are introduced through a specified operation rule, establishing a further logic relation between the first part of component elements and the original data through a logic formula and/or a function.
In this embodiment of the present invention, the splitting the theoretical output object to obtain constituent elements specifically includes:
and performing semantic analysis and/or key-value splitting with database characteristics on the theoretical output object to obtain constituent elements, and matching the constituent elements in the input object.
The semantic analysis and/or the key-value splitting with database characteristics for the theoretical output object is described below by using an example, for example, assuming that the theoretical output object is "earthquake", the "earthquake" is first used to perform matching in the input object, and if the matching fails, the "earthquake" is split into "ground" and "earthquake", where "ground" and "earthquake" are both constituent elements of the theoretical output object "earthquake", and the "ground xxx earthquake" (and is matched with the fuzzy matching in the original data), the "ground or earthquake" or "ground and earthquake" is used to perform matching in the input object.
And if the matching is successful, classifying the corresponding component elements into the second part of component elements.
If the matching fails, further splitting the corresponding component to obtain a minimum component, and if the minimum component split into single bytes still fails to be matched, classifying the corresponding minimum component into the first part of components; and if the matching of the minimum constituent elements obtained by one or at least two times of further splitting is successful, the corresponding minimum constituent elements are classified into the second part of constituent elements.
For example, assuming that the theoretical output object is "water level is going to rise," the split constituent elements are "water level", "want" and "rise," the minimum constituent element split by "water level" is "water" and "bit," the minimum constituent element split by "rise" is "up" and "rise," and "want" is a single byte and cannot be split any further.
In this embodiment of the present invention, the extracting a first part of components for which no associated position can be found in the input object specifically includes:
and supplementing the relevant data which is obtained by analyzing and is contained in the input object by the corresponding component element and belongs to the same type of attribute into the input object, and extracting the first part of the component element of which the relevant position cannot be found in the supplemented input object.
Wherein the associated data attributed to the same type of attribute may be understood as follows: assuming that the input object is a table and the same type attribute is a class number, the total class number is the associated data belonging to the same type attribute, and the associated data is not directly embodied in the input object but can be obtained through analysis.
In an embodiment of the present invention, the screening process includes:
respectively taking the specified composition elements and the associated context content in the matched input object as a theoretical output sub-object and an input sub-object, processing the input sub-object by various algorithm sub-modules to obtain various corresponding actual output sub-objects, and selecting one of the algorithm sub-modules if the actual output sub-object operated by one of the algorithm sub-modules is consistent with the theoretical output sub-object; otherwise, the corresponding algorithm sub-module is screened out.
In the embodiment of the invention, if the ETL process cases are stored in the automatic combination process system, the original data are imported into the automatic combination process system, and the stored ETL process cases are traversed to obtain case output corresponding to each case; and matching the final data with each case output one by one.
If the matching degree is higher than the preset proportion value, the successfully matched component elements in the final data follow the successfully matched algorithm sub-modules in the corresponding process cases, and the unmatched algorithm sub-modules in the corresponding process cases are removed; and taking the unmatched composition elements in the final data as theoretical output objects for further splitting, screening corresponding algorithm sub-modules, and adding the screened corresponding algorithm sub-modules to the adaptive positions in the corresponding process cases to obtain the adjusted ETL process to be confirmed.
Wherein the fitting position is determined according to the upstream and downstream positions of the unmatched component elements in the final data.
And if the matching degree is lower than or equal to a preset proportion value, taking the final data as a theoretical output object, determining a logical relationship between the original data and the theoretical output object, screening out corresponding algorithm sub-modules according to the logical relationship, and feeding back the algorithm sub-modules as ETL flows to be confirmed to the operation terminal after the corresponding algorithm sub-modules are arranged and combined.
The ETL process case is an ETL process that is historically screened out by the automatic combined process system according to the logical relationship between the input object and the theoretical output object, and each ETL process stored in the automatic combined process system is an ETL process case, that is, the ETL process case is an algorithm sub-module or a combination of multiple algorithm sub-modules.
Example 2:
an embodiment of the present invention provides an apparatus for accessing geological disaster data through an automatic combination process, as shown in fig. 5, which includes one or more processors 21 and a memory 22. In fig. 5, one processor 21 is taken as an example.
The processor 21 and the memory 22 may be connected by a bus or other means, and fig. 5 illustrates the connection by a bus as an example.
The memory 22, which is a non-volatile computer-readable storage medium, can be used to store non-volatile software programs and non-volatile computer-executable programs, such as the implementation method of the method for accessing geological disaster data through the automatic combination process in embodiment 1. The processor 21 performs the method of accessing geological disaster data by an automated combinatorial process by executing non-volatile software programs and instructions stored in the memory 22.
The memory 22 may include high speed random access memory and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, the memory 22 may optionally include memory located remotely from the processor 21, and these remote memories may be connected to the processor 21 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The program instructions/modules stored in the memory 22, when executed by the one or more processors 21, perform the method of accessing geological disaster data by an automated combinatorial process of embodiment 1 above, e.g., perform the various steps illustrated in fig. 1 and 4 described above.
It should be noted that, for the information interaction, execution process and other contents between the modules and units in the apparatus and system, the specific contents may refer to the description in the embodiment of the method of the present invention because the same concept is used as the embodiment of the processing method of the present invention, and are not described herein again.
Those of ordinary skill in the art will appreciate that all or part of the steps of the various methods of the embodiments may be implemented by associated hardware as instructed by a program, which may be stored on a computer-readable storage medium, which may include: read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disks, and the like.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.
Claims (10)
1. A method for accessing geological disaster data through an automated combinatorial process, comprising:
acquiring original data which are sent by an operation terminal and are to be imported by a target ETL process expected by a user and final data after the ETL process, and importing the original data and the final data into an automatic combined process system as input objects and theoretical output objects respectively;
the automatic combined flow system analyzes the input object and the theoretical output object and determines the logical relationship between the input object and the theoretical output object;
wherein the logical relationship comprises: one or more of the relationship of the associated positions of the components obtained by splitting the theoretical output object in the input object, the relationship of the associated data belonging to the same type of attribute and contained in the input object by the corresponding components, or the relationship that the associated positions of the corresponding components cannot be directly found in the input object;
and screening out corresponding algorithm sub-modules according to the logical relationship, and feeding back the algorithm sub-modules as ETL flows to be confirmed to the operation terminal after the corresponding algorithm sub-modules are arranged and combined.
2. The method for accessing geological disaster data through automatic combination process according to claim 1, wherein in the process of permutation and combination, if there are two or at least two alternative algorithm sub-modules for each link of the ETL process to be confirmed, the corresponding alternative algorithm sub-modules are presented in a manner of selection switching or in a manner of list, specifically comprising:
when the selection switching mode is presented, when one algorithm sub-module is selected and switched, presenting the performance attribute of the corresponding algorithm sub-module during working;
when the algorithm is presented in a list mode, all the selectable algorithm sub-modules and the performance attributes of each algorithm sub-module during working are presented in the list completely;
the performance attributes include a correspondence between the computing resources required by the respective algorithm sub-modules and the time required to compute the unit data volume.
3. The method for accessing geologic hazard data via an automated combinatorial process of claim 2, wherein the computing resources comprise: one or more of the number of physical servers, the number of virtual machines, configuration parameters required by the physical servers, and configuration parameters owned by the virtual machines.
4. The method of accessing geological disaster data by automated combination of processes of claim 1, wherein generating said ETL process to be validated comprises:
splitting the theoretical output object to obtain constituent elements, extracting a first part of constituent elements of which the associated positions cannot be found in the input object, screening corresponding algorithm sub-modules according to the logical relationship between a second part of constituent elements of which the associated positions can be found in the input object and the input object, and arranging and combining the corresponding algorithm sub-modules according to the position relationship of the second part of constituent elements in the theoretical output object to obtain an initial arrangement combination;
feeding back the first part of the component elements and the original data to an operation terminal, and triggering a user to supplement the advanced logic relationship between the first part of the component elements and the original data; and the automatic combined flow system further screens out corresponding algorithm sub-modules according to the advanced logical relationship, and inserts the screened algorithm sub-modules into the initial permutation and combination according to the position relationship of the first part of constituent elements in a theoretical output object to generate the ETL flow to be confirmed.
5. The method of accessing geological disaster data through an automated combinatorial process of claim 4, wherein said advanced logical relationship comprises:
and/or calculating the relation of the first part of component elements by using a specified operation rule between the objects at the specified positions in the original data.
6. The method for accessing geological disaster data through an automatic combination process as claimed in claim 4, wherein the splitting of the theoretical output object to obtain constituent elements specifically comprises:
performing semantic analysis and/or key-value splitting with database characteristics on the theoretical output object to obtain component elements, and matching the component elements in the input object;
if the matching is successful, classifying the corresponding component elements into the second part of component elements;
if the matching fails, further splitting the corresponding component to obtain a minimum component, and if the minimum component split into single bytes still fails to be matched, classifying the corresponding minimum component into the first part of components; and if the matching of the minimum constituent elements obtained by one or at least two times of further splitting is successful, the corresponding minimum constituent elements are classified into the second part of constituent elements.
7. The method of claim 4, wherein the extracting the first part of the components that fail to find the associated location in the input object comprises:
and supplementing the relevant data which is obtained by analyzing and is contained in the input object by the corresponding component element and belongs to the same type of attribute into the input object, and extracting the first part of the component element of which the relevant position cannot be found in the supplemented input object.
8. The method for accessing geologic hazard data via an automated combinatorial process of claim 1, wherein the screening process comprises:
respectively taking the specified composition elements and the associated context content in the matched input object as a theoretical output sub-object and an input sub-object, processing the input sub-object by various algorithm sub-modules to obtain various corresponding actual output sub-objects, and selecting one of the algorithm sub-modules if the actual output sub-object operated by one of the algorithm sub-modules is consistent with the theoretical output sub-object; otherwise, the corresponding algorithm sub-module is screened out.
9. The method according to claim 1, wherein if an ETL flow case is stored in the automatic combined flow system, the original data is imported into the automatic combined flow system, and the stored ETL flow case is traversed to obtain case output corresponding to each case; matching the final data with each case output one by one;
if the matching degree is higher than the preset proportion value, the successfully matched component elements in the final data follow the successfully matched algorithm sub-modules in the corresponding process cases, and the unmatched algorithm sub-modules in the corresponding process cases are removed; after further splitting the unmatched composition elements in the final data as theoretical output objects, screening corresponding algorithm sub-modules, and adding the screened corresponding algorithm sub-modules to the adaptive positions in the corresponding process cases to obtain the adjusted ETL process to be confirmed;
wherein the fitting position is determined according to the upstream and downstream positions of the unmatched component elements in the final data.
10. An apparatus for accessing geological disaster data through an automated combinatorial process, comprising:
at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor for performing the method of accessing geological disaster data by automated combinatorial processing of any of claims 1-9.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110848943.XA CN114880385B (en) | 2021-07-27 | 2021-07-27 | Method and device for accessing geological disaster data through automatic combination process |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110848943.XA CN114880385B (en) | 2021-07-27 | 2021-07-27 | Method and device for accessing geological disaster data through automatic combination process |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114880385A true CN114880385A (en) | 2022-08-09 |
CN114880385B CN114880385B (en) | 2022-11-22 |
Family
ID=82667331
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110848943.XA Active CN114880385B (en) | 2021-07-27 | 2021-07-27 | Method and device for accessing geological disaster data through automatic combination process |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114880385B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115936633A (en) * | 2023-01-09 | 2023-04-07 | 广东远景信息科技有限公司 | Emergency flow linking method, electronic equipment and storage medium |
Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101388844A (en) * | 2008-11-07 | 2009-03-18 | 东软集团股份有限公司 | Data flow processing method and system |
CN102033748A (en) * | 2010-12-03 | 2011-04-27 | 中国科学院软件研究所 | Method for generating data processing flow codes |
US20120296862A1 (en) * | 2011-05-19 | 2012-11-22 | Compact Solutions, Llc | Method and apparatus for analyzing and migrating data integration applications |
CN103309904A (en) * | 2012-03-16 | 2013-09-18 | 阿里巴巴集团控股有限公司 | Method and device for generating data warehouse ETL (Extraction, Transformation and Loading) codes |
CN104778236A (en) * | 2015-04-02 | 2015-07-15 | 上海烟草集团有限责任公司 | ETL (Extract-Transform-Load) realization method and system based on metadata |
US20170060969A1 (en) * | 2015-09-02 | 2017-03-02 | International Business Machines Corporation | Automating extract, transform, and load job testing |
CN106874016A (en) * | 2017-03-07 | 2017-06-20 | 长江大学 | A kind of new customizable big data platform architecture method |
CN109492059A (en) * | 2019-01-03 | 2019-03-19 | 北京理工大学 | A kind of multi-source heterogeneous data fusion and Modifying model process management and control method |
CN109947746A (en) * | 2017-10-26 | 2019-06-28 | 亿阳信通股份有限公司 | A kind of quality of data management-control method and system based on ETL process |
CN110765196A (en) * | 2019-10-25 | 2020-02-07 | 四川东方网力科技有限公司 | Method and equipment for generating and executing ETL task |
CN111324647A (en) * | 2020-01-21 | 2020-06-23 | 北京东方金信科技有限公司 | Method and device for generating ETL code |
CN111930357A (en) * | 2020-09-17 | 2020-11-13 | 国网浙江省电力有限公司营销服务中心 | Construction method of visual modeling job flow scheduling engine |
CN112115192A (en) * | 2020-10-09 | 2020-12-22 | 北京东方通软件有限公司 | Efficient flow arrangement method and system for ETL system |
CN113111107A (en) * | 2021-04-06 | 2021-07-13 | 创意信息技术股份有限公司 | Data comprehensive access system and method |
CN113111106A (en) * | 2021-04-06 | 2021-07-13 | 创意信息技术股份有限公司 | ETL design data access method and data access module based on Web |
-
2021
- 2021-07-27 CN CN202110848943.XA patent/CN114880385B/en active Active
Patent Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101388844A (en) * | 2008-11-07 | 2009-03-18 | 东软集团股份有限公司 | Data flow processing method and system |
CN102033748A (en) * | 2010-12-03 | 2011-04-27 | 中国科学院软件研究所 | Method for generating data processing flow codes |
US20120296862A1 (en) * | 2011-05-19 | 2012-11-22 | Compact Solutions, Llc | Method and apparatus for analyzing and migrating data integration applications |
CN103309904A (en) * | 2012-03-16 | 2013-09-18 | 阿里巴巴集团控股有限公司 | Method and device for generating data warehouse ETL (Extraction, Transformation and Loading) codes |
CN104778236A (en) * | 2015-04-02 | 2015-07-15 | 上海烟草集团有限责任公司 | ETL (Extract-Transform-Load) realization method and system based on metadata |
US20170060969A1 (en) * | 2015-09-02 | 2017-03-02 | International Business Machines Corporation | Automating extract, transform, and load job testing |
CN106874016A (en) * | 2017-03-07 | 2017-06-20 | 长江大学 | A kind of new customizable big data platform architecture method |
CN109947746A (en) * | 2017-10-26 | 2019-06-28 | 亿阳信通股份有限公司 | A kind of quality of data management-control method and system based on ETL process |
CN109492059A (en) * | 2019-01-03 | 2019-03-19 | 北京理工大学 | A kind of multi-source heterogeneous data fusion and Modifying model process management and control method |
CN110765196A (en) * | 2019-10-25 | 2020-02-07 | 四川东方网力科技有限公司 | Method and equipment for generating and executing ETL task |
CN111324647A (en) * | 2020-01-21 | 2020-06-23 | 北京东方金信科技有限公司 | Method and device for generating ETL code |
CN111930357A (en) * | 2020-09-17 | 2020-11-13 | 国网浙江省电力有限公司营销服务中心 | Construction method of visual modeling job flow scheduling engine |
CN112115192A (en) * | 2020-10-09 | 2020-12-22 | 北京东方通软件有限公司 | Efficient flow arrangement method and system for ETL system |
CN113111107A (en) * | 2021-04-06 | 2021-07-13 | 创意信息技术股份有限公司 | Data comprehensive access system and method |
CN113111106A (en) * | 2021-04-06 | 2021-07-13 | 创意信息技术股份有限公司 | ETL design data access method and data access module based on Web |
Non-Patent Citations (2)
Title |
---|
XIAOLIANG LI 等: "The Research and Application of an ETL Model Based on Task", 《2009 FIRST INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND ENGINEERING》 * |
楚静: "基于商务智能的景区决策支持系统研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115936633A (en) * | 2023-01-09 | 2023-04-07 | 广东远景信息科技有限公司 | Emergency flow linking method, electronic equipment and storage medium |
CN115936633B (en) * | 2023-01-09 | 2023-11-03 | 广东远景信息科技有限公司 | Emergency flow connection method, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN114880385B (en) | 2022-11-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11068439B2 (en) | Unsupervised method for enriching RDF data sources from denormalized data | |
US10102039B2 (en) | Converting a hybrid flow | |
CA2973234C (en) | System and method for querying data sources | |
CN105335403B (en) | Database access method and device and database system | |
CN109815283B (en) | Heterogeneous data source visual query method | |
EP2924588A1 (en) | Report creation method, device and system | |
CN106570022B (en) | Cross-data-source query method, device and system | |
CN104933095A (en) | Heterogeneous information universality correlation analysis system and analysis method thereof | |
CN109857803B (en) | Data synchronization method, device, equipment, system and computer readable storage medium | |
CN108197091B (en) | Method, system and related equipment for creating data table | |
CN104111958A (en) | Data query method and device | |
US11461333B2 (en) | Vertical union of feature-based datasets | |
CN108197187B (en) | Query statement optimization method and device, storage medium and computer equipment | |
WO2017096155A1 (en) | Methods and systems for mapping object oriented/functional languages to database languages | |
CN104166701A (en) | Machine learning method and system | |
CN114880385B (en) | Method and device for accessing geological disaster data through automatic combination process | |
CN111125199B (en) | Database access method and device and electronic equipment | |
CN115392501A (en) | Data acquisition method and device, electronic equipment and storage medium | |
CN113468571B (en) | Source tracing method based on block chain | |
CN111475165A (en) | Intelligent compiling method, system, terminal and storage medium for application program | |
CN111159213A (en) | Data query method, device, system and storage medium | |
CN103324640B (en) | A kind of method, device and equipment determining search result document | |
CN114254005A (en) | Grouping aggregation query method and device for partition table, computer equipment and medium | |
CN108089871A (en) | Automatic updating method of software, device, equipment and storage medium | |
CN112395306A (en) | Database system, data processing method, data processing device and computer storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |