CN114880385A - Method and device for accessing geological disaster data through automatic combined flow - Google Patents

Method and device for accessing geological disaster data through automatic combined flow Download PDF

Info

Publication number
CN114880385A
CN114880385A CN202110848943.XA CN202110848943A CN114880385A CN 114880385 A CN114880385 A CN 114880385A CN 202110848943 A CN202110848943 A CN 202110848943A CN 114880385 A CN114880385 A CN 114880385A
Authority
CN
China
Prior art keywords
modules
data
sub
algorithm sub
input object
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110848943.XA
Other languages
Chinese (zh)
Other versions
CN114880385B (en
Inventor
杨迎冬
黄成�
晏祥省
魏蕾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yunnan Institute Of Geological Environment Monitoring Yunnan Institute Of Environmental Geology
Original Assignee
Yunnan Institute Of Geological Environment Monitoring Yunnan Institute Of Environmental Geology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yunnan Institute Of Geological Environment Monitoring Yunnan Institute Of Environmental Geology filed Critical Yunnan Institute Of Geological Environment Monitoring Yunnan Institute Of Environmental Geology
Priority to CN202110848943.XA priority Critical patent/CN114880385B/en
Publication of CN114880385A publication Critical patent/CN114880385A/en
Application granted granted Critical
Publication of CN114880385B publication Critical patent/CN114880385B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to the technical field of networks, and provides a method and a device for accessing geological disaster data through an automatic combined process, wherein the method comprises the following steps: acquiring original data and final data, taking the original data and the final data as an input object and a theoretical output object, and importing the input object and the theoretical output object into an automatic combination flow system; the automatic combined flow system analyzes the input object and the theoretical output object and determines the logical relationship between the input object and the theoretical output object; screening out corresponding algorithm sub-modules according to the logical relationship, and feeding back the algorithm sub-modules as ETL flows to be confirmed to the operation terminal after the corresponding algorithm sub-modules are arranged and combined; the invention provides a method for accessing geological disaster data through an automatic combination process, which is improved on the basis of a mode that a traditional ETL tool manually builds a data access process, automatically screens corresponding algorithm sub-modules by analyzing the logical relationship between an input object and a theoretical output object, and saves the collocation mode of most complex logical processes.

Description

Method and device for accessing geological disaster data through automatic combined flow
Technical Field
The invention relates to the technical field of networks, in particular to a method and a device for accessing geological disaster data through an automatic combined process.
Background
The data warehouse technology (Extract-Transform-Load, abbreviated as ETL) is used to describe the process of extracting (Extract), converting (Transform), and loading (Load) data from a source end to a destination end; the ETL can be used for extracting, cleaning and converting geological disaster data and then loading the geological disaster data to a data warehouse, and aims to integrate scattered, disordered and standard non-uniform data in the geological disaster data together and provide analysis basis for disaster early warning.
Most of the ETL tools provided in the market at present are universal data processing platforms with functions of data extraction (Extract), cleaning conversion (Transform) and loading (Load), and the ETL tools also integrate functions of data synchronization, data exchange and data integration, and can provide complete support for data integration-based application and daily data cleaning conversion work. The method has the advantages that the support for various common data sources such as various databases, message servers, text files, XML, Excel files, WebService, LDAP and the like is built in, and the visual flow designer supporting dragging is provided, so that the working efficiency can be greatly improved, but excessive energy and time are consumed in the process of designing the complex flow for accessing the geological disaster data.
In view of the above, overcoming the drawbacks of the prior art is an urgent problem in the art.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: the existing method for accessing geological disaster data is often too complex in flow, too many operation steps are needed for building a design flow, too much effort and time are needed to be consumed, and the working efficiency is reduced.
In order to achieve the purpose, the invention adopts the following technical scheme:
in a first aspect, the present invention provides a method for accessing geological disaster data through an automatic combination process, including:
acquiring original data which are sent by an operation terminal and are to be imported by a target ETL process expected by a user and final data after the ETL process, and importing the original data and the final data into an automatic combined process system as input objects and theoretical output objects respectively;
the automatic combined flow system analyzes the input object and the theoretical output object and determines the logical relationship between the input object and the theoretical output object;
wherein the logical relationship comprises: one or more of the relationship of the associated positions of the components obtained by splitting the theoretical output object in the input object, the relationship of the associated data belonging to the same type of attribute and contained in the input object by the corresponding components, or the relationship that the associated positions of the corresponding components cannot be directly found in the input object;
and screening out corresponding algorithm sub-modules according to the logical relationship, and feeding back the algorithm sub-modules as ETL flows to be confirmed to the operation terminal after the corresponding algorithm sub-modules are arranged and combined.
Preferably, in the process of permutation and combination, if there are two or at least two alternative algorithm sub-modules for each link of the ETL flow to be confirmed, the corresponding alternative algorithm sub-modules are presented in a manner of selection switching or in a manner of list listing, specifically including:
when the selection switching mode is presented, when one algorithm sub-module is selected and switched, presenting the performance attribute of the corresponding algorithm sub-module during working;
when the algorithm is presented in a list mode, all the selectable algorithm sub-modules and the performance attributes of each algorithm sub-module during working are presented in the list completely;
the performance attributes include a correspondence between the computing resources required by the respective algorithm sub-modules and the time required to compute the unit data volume.
Preferably, the computing resources include: one or more of the number of physical servers, the number of virtual machines, configuration parameters required by the physical servers, and configuration parameters owned by the virtual machines.
Preferably, the process of generating the ETL procedure to be confirmed includes:
splitting the theoretical output object to obtain constituent elements, extracting a first part of constituent elements of which the associated positions cannot be found in the input object, screening corresponding algorithm sub-modules according to the logical relationship between a second part of constituent elements of which the associated positions can be found in the input object and the input object, and arranging and combining the corresponding algorithm sub-modules according to the position relationship of the second part of constituent elements in the theoretical output object to obtain an initial arrangement combination;
feeding back the first part of the component elements and the original data to an operation terminal, and triggering a user to supplement the advanced logic relationship between the first part of the component elements and the original data; and the automatic combined flow system further screens out corresponding algorithm sub-modules according to the advanced logical relationship, and inserts the screened algorithm sub-modules into the initial arrangement combination according to the position relationship of the first component element in a theoretical output object to generate the ETL flow to be confirmed.
Preferably, the advanced logical relationship includes:
and/or calculating the relation of the first part of component elements by using a specified operation rule between the objects at the specified positions in the original data.
Preferably, the splitting the theoretical output object to obtain constituent elements specifically includes:
performing semantic analysis and/or key-value splitting with database characteristics on the theoretical output object to obtain component elements, and matching the component elements in the input object;
if the matching is successful, classifying the corresponding component elements into the second part of component elements;
if the matching fails, further splitting the corresponding component to obtain a minimum component, and if the minimum component split into single bytes still fails to be matched, classifying the corresponding minimum component into the first part of components; and if the matching of the minimum constituent elements obtained by one or at least two times of further splitting is successful, the corresponding minimum constituent elements are classified into the second part of constituent elements.
Preferably, the extracting the first part of the components for which the associated position cannot be found in the input object specifically includes:
and supplementing the relevant data which is obtained by analyzing and is contained in the input object by the corresponding component element and belongs to the same type of attribute into the input object, and extracting the first part of the component element of which the relevant position cannot be found in the supplemented input object.
Preferably, the screening process comprises:
respectively taking the specified composition elements and the associated context content in the matched input object as a theoretical output sub-object and an input sub-object, processing the input sub-object by various algorithm sub-modules to obtain various actual output sub-objects on corresponding sides, and if the actual output sub-object operated by one algorithm sub-module is consistent with the theoretical output sub-object, selecting the corresponding algorithm sub-module; otherwise, the corresponding algorithm sub-module is screened out.
Preferably, if an ETL flow case is stored in the automatic combination flow system, importing the original data into the automatic combination flow system, and traversing the stored ETL flow case to obtain a case output corresponding to each case; matching the final data with each case output one by one;
if the matching degree is higher than the preset proportion value, the successfully matched component elements in the final data follow the successfully matched algorithm sub-modules in the corresponding process cases, and the unmatched algorithm sub-modules in the corresponding process cases are removed; after further splitting the unmatched composition elements in the final data as theoretical output objects, screening corresponding algorithm sub-modules, and adding the screened corresponding algorithm sub-modules to the adaptive positions in the corresponding process cases to obtain the adjusted ETL process to be confirmed;
wherein the fitting position is determined according to the upstream and downstream positions of the unmatched component elements in the final data.
In a second aspect, the present invention provides an apparatus for accessing geological disaster data through an automatic combination process, which is used to implement the method for accessing geological disaster data through an automatic combination process in the first aspect, and the apparatus for accessing geological disaster data through an automatic combination process includes:
at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor for performing the method of accessing geological disaster data by an automated combinatorial process of the first aspect.
Compared with the prior art, the invention has the beneficial effects that:
in order to fully simplify the function of accessing the geological disaster data, the invention provides a method for accessing the geological disaster data through an automatic combination flow, which is improved on the basis of a mode that a traditional ETL tool manually establishes a geological disaster data access flow, and automatically screens corresponding algorithm submodules by analyzing the logic relation between an input object and a theoretical output object, so that most of matching modes of complex logic flows are saved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required to be used in the embodiments of the present invention will be briefly described below. It is obvious that the drawings described below are only some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.
FIG. 1 is a schematic flow chart diagram illustrating a method for accessing geological disaster data through an automated combination process according to an embodiment of the present invention;
FIG. 2 is a schematic diagram presented in a manner of selecting switching for a method for accessing geological disaster data through an automatic combination process according to an embodiment of the present invention;
FIG. 3 is a schematic diagram presented in a list form of a method for accessing geological disaster data through an automatic combination process according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a process for generating ETL to be confirmed according to a method for accessing geological disaster data through an automatic combination process, provided by an embodiment of the present invention;
fig. 5 is a schematic structural diagram of an apparatus for accessing geological disaster data through an automatic combination process according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
In the description of the present invention, the terms "inner", "outer", "longitudinal", "lateral", "upper", "lower", "top", "bottom", and the like indicate orientations or positional relationships based on those shown in the drawings, and are for convenience only to describe the present invention without requiring the present invention to be necessarily constructed and operated in a specific orientation, and thus should not be construed as limiting the present invention.
In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
Example 1:
the embodiment of the invention provides a method for accessing geological disaster data through an automatic combined process, which comprises the following steps of:
in step 201, original data to be imported by a target ETL process desired by a user and sent by an operation terminal and final data after processing the ETL process are obtained, and the original data and the final data are respectively used as an input object and a theoretical output object and imported into an automatic combination process system.
The original data and the final data are imported into an automatic combination process system as input objects and theoretical output objects, so as to obtain corresponding ETL processes, namely the original data are processed by the corresponding ETL processes to obtain the final data; the ETL process is one algorithm sub-module or a combination of a plurality of algorithm sub-modules.
In step 202, the automated composition flow system analyzes the input object and the theoretical output object and determines a logical relationship between the input object and the theoretical output object.
Wherein the logical relationship comprises: the relationship of the associated positions of the components obtained by splitting the theoretical output object in the input object, the relationship of the associated data belonging to the same type of attribute and contained in the input object by the corresponding components, or the relationship that the associated positions of the corresponding components cannot be directly found in the input object.
Wherein the associated data attributed to the same type of attribute may be understood as follows: assuming that the input object is a geological disaster data table and the same type of attribute is the number of different monitoring devices, the total number of the monitoring devices is the associated data belonging to the same type of attribute, and the associated data is not directly reflected in the input object but can be obtained through analysis.
The relation that the corresponding component element can not directly find the associated position in the input object is determined after the corresponding component element and the associated data are analyzed and the associated position can not be found in the input object; for such an ambiguous logical relationship, the user is required to supplement an intermediate relationship between the corresponding constituent elements and the input object through the operation terminal, and the intermediate relationship is obtained by introducing an encryption algorithm or specifying an operation rule.
In step 203, screening out corresponding algorithm sub-modules according to the logical relationship, and feeding back the algorithm sub-modules after permutation and combination as an ETL process to be confirmed to the operation terminal, specifically including: finding the associated context content in the input object matched with the specified component element, taking the associated context content in the input object matched with the specified component element as an input sub-object, processing the input sub-object by different algorithm sub-modules to obtain a corresponding actual output sub-object, finding the actual output sub-object completely matched with the specified component element, wherein the algorithm sub-module corresponding to the actual output sub-object completely matched with the specified component element is the algorithm sub-module corresponding to the specified component element obtained by screening; and continuously screening the algorithm submodules corresponding to other constituent elements, and arranging and combining the screened algorithm submodules according to the positions of the corresponding constituent elements in the final data to obtain the ETL flow to be confirmed and feeding back the ETL flow to the operation terminal.
In the embodiment of the present invention, in the process of permutation and combination, if there are two or at least two alternative algorithm sub-modules for each link of the ETL flow to be confirmed, the corresponding alternative algorithm sub-modules are presented in a manner of selecting switching or in a manner of list, which specifically includes:
as shown in fig. 2, when presented in a selective switching manner, when a selection is switched to one of the algorithm sub-modules, the performance attributes of the corresponding algorithm sub-module during operation are presented.
When presented in a list, all the algorithm sub-modules from which selection is made and the performance attributes of each algorithm sub-module when operating are presented in the list in its entirety, as shown in fig. 3.
The performance attributes include a correspondence between the computing resources required by the respective algorithm sub-modules and the time required to compute the unit data volume.
And the semi-automatic presentation shows that the automatic combined flow system presents all the selectable algorithm sub-modules and corresponding performance attributes to a user, and the user selects a proper algorithm sub-module in each link of the ETL flow presented by the automatic combined flow system according to the existing computing resources, the total amount of data to be processed and the expected computing time to obtain the final ETL flow.
The corresponding relationship between the links and the algorithm sub-modules can be that one algorithm sub-module covers a plurality of links, that is, one algorithm sub-module completes all processes, for example, ETL has at least three links, and one algorithm can cover three links; for example, the E link in the ETL may correspond to four selectable algorithm sub-modules, the T link corresponds to two selectable algorithm sub-modules, and the T link corresponds to three selectable algorithm sub-modules.
In order to further simplify the operation, the operation can be upgraded from semi-automatic presentation to full-automatic presentation, which specifically includes: inputting related parameters by a user, and automatically completing selection of an ETL flow by the automatic combined flow system according to the identified information and/or the parameters input by the user; the relevant parameters include: it is desirable to calculate one or more of time, evaluation of own device, and total amount of data to be processed.
In an embodiment of the present invention, the computing resources include: one or more of the number of physical servers, the number of virtual machines, configuration parameters required by the physical servers and configuration parameters owned by the virtual machines; wherein the configuration parameters include a CPU and/or a memory.
In this embodiment of the present invention, as shown in fig. 4, the process of generating the ETL flow to be confirmed includes:
in step 301, splitting the theoretical output object to obtain component elements, where the component elements include a first part of component elements and a second part of component elements; the first part of components can not find the associated position in the input object, and the second part of components can directly or indirectly find the associated position in the input object; the fact that the associated position can be directly found means that the associated position can be found in the input object by the constituent elements obtained after the theoretical output object is split for the first time; the fact that the associated position can be indirectly found means that the associated position can be found in the input object by the constituent elements obtained after the theoretical output object is split at least twice.
In step 302, the first part of the components that can not find the associated position in the input object are extracted.
In step 303, screening out corresponding algorithm sub-modules according to a second part of components of which associated positions can be found in the input object and the logical relationship of the input object, and performing permutation and combination on the corresponding algorithm sub-modules according to the position relationship of the second part of components in the theoretical output object to obtain an initial permutation and combination.
In step 304, the first part of components and the original data are fed back to the operation terminal, and the user is triggered to supplement the advanced logical relationship between the first part of components and the original data.
In step 305, the automatic combined process system further screens out corresponding algorithm sub-modules according to the advanced logical relationship, and inserts the screened algorithm sub-modules into the initial arrangement combination according to the position relationship of the first component in the theoretical output object, so as to generate the ETL process to be confirmed.
In an embodiment of the present invention, the advanced logical relationship includes:
and/or calculating the relation of the first part of component elements by using a specified operation rule between the objects at the specified positions in the original data.
The advanced logic relationship is given when a user identifies the introduction route of the first part of the component elements, and if the first part of the component elements are introduced through an encryption algorithm, different algorithm submodules are linked through filling of a script language; and if the first part of component elements are introduced through a specified operation rule, establishing a further logic relation between the first part of component elements and the original data through a logic formula and/or a function.
In this embodiment of the present invention, the splitting the theoretical output object to obtain constituent elements specifically includes:
and performing semantic analysis and/or key-value splitting with database characteristics on the theoretical output object to obtain constituent elements, and matching the constituent elements in the input object.
The semantic analysis and/or the key-value splitting with database characteristics for the theoretical output object is described below by using an example, for example, assuming that the theoretical output object is "earthquake", the "earthquake" is first used to perform matching in the input object, and if the matching fails, the "earthquake" is split into "ground" and "earthquake", where "ground" and "earthquake" are both constituent elements of the theoretical output object "earthquake", and the "ground xxx earthquake" (and is matched with the fuzzy matching in the original data), the "ground or earthquake" or "ground and earthquake" is used to perform matching in the input object.
And if the matching is successful, classifying the corresponding component elements into the second part of component elements.
If the matching fails, further splitting the corresponding component to obtain a minimum component, and if the minimum component split into single bytes still fails to be matched, classifying the corresponding minimum component into the first part of components; and if the matching of the minimum constituent elements obtained by one or at least two times of further splitting is successful, the corresponding minimum constituent elements are classified into the second part of constituent elements.
For example, assuming that the theoretical output object is "water level is going to rise," the split constituent elements are "water level", "want" and "rise," the minimum constituent element split by "water level" is "water" and "bit," the minimum constituent element split by "rise" is "up" and "rise," and "want" is a single byte and cannot be split any further.
In this embodiment of the present invention, the extracting a first part of components for which no associated position can be found in the input object specifically includes:
and supplementing the relevant data which is obtained by analyzing and is contained in the input object by the corresponding component element and belongs to the same type of attribute into the input object, and extracting the first part of the component element of which the relevant position cannot be found in the supplemented input object.
Wherein the associated data attributed to the same type of attribute may be understood as follows: assuming that the input object is a table and the same type attribute is a class number, the total class number is the associated data belonging to the same type attribute, and the associated data is not directly embodied in the input object but can be obtained through analysis.
In an embodiment of the present invention, the screening process includes:
respectively taking the specified composition elements and the associated context content in the matched input object as a theoretical output sub-object and an input sub-object, processing the input sub-object by various algorithm sub-modules to obtain various corresponding actual output sub-objects, and selecting one of the algorithm sub-modules if the actual output sub-object operated by one of the algorithm sub-modules is consistent with the theoretical output sub-object; otherwise, the corresponding algorithm sub-module is screened out.
In the embodiment of the invention, if the ETL process cases are stored in the automatic combination process system, the original data are imported into the automatic combination process system, and the stored ETL process cases are traversed to obtain case output corresponding to each case; and matching the final data with each case output one by one.
If the matching degree is higher than the preset proportion value, the successfully matched component elements in the final data follow the successfully matched algorithm sub-modules in the corresponding process cases, and the unmatched algorithm sub-modules in the corresponding process cases are removed; and taking the unmatched composition elements in the final data as theoretical output objects for further splitting, screening corresponding algorithm sub-modules, and adding the screened corresponding algorithm sub-modules to the adaptive positions in the corresponding process cases to obtain the adjusted ETL process to be confirmed.
Wherein the fitting position is determined according to the upstream and downstream positions of the unmatched component elements in the final data.
And if the matching degree is lower than or equal to a preset proportion value, taking the final data as a theoretical output object, determining a logical relationship between the original data and the theoretical output object, screening out corresponding algorithm sub-modules according to the logical relationship, and feeding back the algorithm sub-modules as ETL flows to be confirmed to the operation terminal after the corresponding algorithm sub-modules are arranged and combined.
The ETL process case is an ETL process that is historically screened out by the automatic combined process system according to the logical relationship between the input object and the theoretical output object, and each ETL process stored in the automatic combined process system is an ETL process case, that is, the ETL process case is an algorithm sub-module or a combination of multiple algorithm sub-modules.
Example 2:
an embodiment of the present invention provides an apparatus for accessing geological disaster data through an automatic combination process, as shown in fig. 5, which includes one or more processors 21 and a memory 22. In fig. 5, one processor 21 is taken as an example.
The processor 21 and the memory 22 may be connected by a bus or other means, and fig. 5 illustrates the connection by a bus as an example.
The memory 22, which is a non-volatile computer-readable storage medium, can be used to store non-volatile software programs and non-volatile computer-executable programs, such as the implementation method of the method for accessing geological disaster data through the automatic combination process in embodiment 1. The processor 21 performs the method of accessing geological disaster data by an automated combinatorial process by executing non-volatile software programs and instructions stored in the memory 22.
The memory 22 may include high speed random access memory and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, the memory 22 may optionally include memory located remotely from the processor 21, and these remote memories may be connected to the processor 21 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The program instructions/modules stored in the memory 22, when executed by the one or more processors 21, perform the method of accessing geological disaster data by an automated combinatorial process of embodiment 1 above, e.g., perform the various steps illustrated in fig. 1 and 4 described above.
It should be noted that, for the information interaction, execution process and other contents between the modules and units in the apparatus and system, the specific contents may refer to the description in the embodiment of the method of the present invention because the same concept is used as the embodiment of the processing method of the present invention, and are not described herein again.
Those of ordinary skill in the art will appreciate that all or part of the steps of the various methods of the embodiments may be implemented by associated hardware as instructed by a program, which may be stored on a computer-readable storage medium, which may include: read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disks, and the like.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims (10)

1. A method for accessing geological disaster data through an automated combinatorial process, comprising:
acquiring original data which are sent by an operation terminal and are to be imported by a target ETL process expected by a user and final data after the ETL process, and importing the original data and the final data into an automatic combined process system as input objects and theoretical output objects respectively;
the automatic combined flow system analyzes the input object and the theoretical output object and determines the logical relationship between the input object and the theoretical output object;
wherein the logical relationship comprises: one or more of the relationship of the associated positions of the components obtained by splitting the theoretical output object in the input object, the relationship of the associated data belonging to the same type of attribute and contained in the input object by the corresponding components, or the relationship that the associated positions of the corresponding components cannot be directly found in the input object;
and screening out corresponding algorithm sub-modules according to the logical relationship, and feeding back the algorithm sub-modules as ETL flows to be confirmed to the operation terminal after the corresponding algorithm sub-modules are arranged and combined.
2. The method for accessing geological disaster data through automatic combination process according to claim 1, wherein in the process of permutation and combination, if there are two or at least two alternative algorithm sub-modules for each link of the ETL process to be confirmed, the corresponding alternative algorithm sub-modules are presented in a manner of selection switching or in a manner of list, specifically comprising:
when the selection switching mode is presented, when one algorithm sub-module is selected and switched, presenting the performance attribute of the corresponding algorithm sub-module during working;
when the algorithm is presented in a list mode, all the selectable algorithm sub-modules and the performance attributes of each algorithm sub-module during working are presented in the list completely;
the performance attributes include a correspondence between the computing resources required by the respective algorithm sub-modules and the time required to compute the unit data volume.
3. The method for accessing geologic hazard data via an automated combinatorial process of claim 2, wherein the computing resources comprise: one or more of the number of physical servers, the number of virtual machines, configuration parameters required by the physical servers, and configuration parameters owned by the virtual machines.
4. The method of accessing geological disaster data by automated combination of processes of claim 1, wherein generating said ETL process to be validated comprises:
splitting the theoretical output object to obtain constituent elements, extracting a first part of constituent elements of which the associated positions cannot be found in the input object, screening corresponding algorithm sub-modules according to the logical relationship between a second part of constituent elements of which the associated positions can be found in the input object and the input object, and arranging and combining the corresponding algorithm sub-modules according to the position relationship of the second part of constituent elements in the theoretical output object to obtain an initial arrangement combination;
feeding back the first part of the component elements and the original data to an operation terminal, and triggering a user to supplement the advanced logic relationship between the first part of the component elements and the original data; and the automatic combined flow system further screens out corresponding algorithm sub-modules according to the advanced logical relationship, and inserts the screened algorithm sub-modules into the initial permutation and combination according to the position relationship of the first part of constituent elements in a theoretical output object to generate the ETL flow to be confirmed.
5. The method of accessing geological disaster data through an automated combinatorial process of claim 4, wherein said advanced logical relationship comprises:
and/or calculating the relation of the first part of component elements by using a specified operation rule between the objects at the specified positions in the original data.
6. The method for accessing geological disaster data through an automatic combination process as claimed in claim 4, wherein the splitting of the theoretical output object to obtain constituent elements specifically comprises:
performing semantic analysis and/or key-value splitting with database characteristics on the theoretical output object to obtain component elements, and matching the component elements in the input object;
if the matching is successful, classifying the corresponding component elements into the second part of component elements;
if the matching fails, further splitting the corresponding component to obtain a minimum component, and if the minimum component split into single bytes still fails to be matched, classifying the corresponding minimum component into the first part of components; and if the matching of the minimum constituent elements obtained by one or at least two times of further splitting is successful, the corresponding minimum constituent elements are classified into the second part of constituent elements.
7. The method of claim 4, wherein the extracting the first part of the components that fail to find the associated location in the input object comprises:
and supplementing the relevant data which is obtained by analyzing and is contained in the input object by the corresponding component element and belongs to the same type of attribute into the input object, and extracting the first part of the component element of which the relevant position cannot be found in the supplemented input object.
8. The method for accessing geologic hazard data via an automated combinatorial process of claim 1, wherein the screening process comprises:
respectively taking the specified composition elements and the associated context content in the matched input object as a theoretical output sub-object and an input sub-object, processing the input sub-object by various algorithm sub-modules to obtain various corresponding actual output sub-objects, and selecting one of the algorithm sub-modules if the actual output sub-object operated by one of the algorithm sub-modules is consistent with the theoretical output sub-object; otherwise, the corresponding algorithm sub-module is screened out.
9. The method according to claim 1, wherein if an ETL flow case is stored in the automatic combined flow system, the original data is imported into the automatic combined flow system, and the stored ETL flow case is traversed to obtain case output corresponding to each case; matching the final data with each case output one by one;
if the matching degree is higher than the preset proportion value, the successfully matched component elements in the final data follow the successfully matched algorithm sub-modules in the corresponding process cases, and the unmatched algorithm sub-modules in the corresponding process cases are removed; after further splitting the unmatched composition elements in the final data as theoretical output objects, screening corresponding algorithm sub-modules, and adding the screened corresponding algorithm sub-modules to the adaptive positions in the corresponding process cases to obtain the adjusted ETL process to be confirmed;
wherein the fitting position is determined according to the upstream and downstream positions of the unmatched component elements in the final data.
10. An apparatus for accessing geological disaster data through an automated combinatorial process, comprising:
at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor for performing the method of accessing geological disaster data by automated combinatorial processing of any of claims 1-9.
CN202110848943.XA 2021-07-27 2021-07-27 Method and device for accessing geological disaster data through automatic combination process Active CN114880385B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110848943.XA CN114880385B (en) 2021-07-27 2021-07-27 Method and device for accessing geological disaster data through automatic combination process

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110848943.XA CN114880385B (en) 2021-07-27 2021-07-27 Method and device for accessing geological disaster data through automatic combination process

Publications (2)

Publication Number Publication Date
CN114880385A true CN114880385A (en) 2022-08-09
CN114880385B CN114880385B (en) 2022-11-22

Family

ID=82667331

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110848943.XA Active CN114880385B (en) 2021-07-27 2021-07-27 Method and device for accessing geological disaster data through automatic combination process

Country Status (1)

Country Link
CN (1) CN114880385B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115936633A (en) * 2023-01-09 2023-04-07 广东远景信息科技有限公司 Emergency flow linking method, electronic equipment and storage medium

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101388844A (en) * 2008-11-07 2009-03-18 东软集团股份有限公司 Data flow processing method and system
CN102033748A (en) * 2010-12-03 2011-04-27 中国科学院软件研究所 Method for generating data processing flow codes
US20120296862A1 (en) * 2011-05-19 2012-11-22 Compact Solutions, Llc Method and apparatus for analyzing and migrating data integration applications
CN103309904A (en) * 2012-03-16 2013-09-18 阿里巴巴集团控股有限公司 Method and device for generating data warehouse ETL (Extraction, Transformation and Loading) codes
CN104778236A (en) * 2015-04-02 2015-07-15 上海烟草集团有限责任公司 ETL (Extract-Transform-Load) realization method and system based on metadata
US20170060969A1 (en) * 2015-09-02 2017-03-02 International Business Machines Corporation Automating extract, transform, and load job testing
CN106874016A (en) * 2017-03-07 2017-06-20 长江大学 A kind of new customizable big data platform architecture method
CN109492059A (en) * 2019-01-03 2019-03-19 北京理工大学 A kind of multi-source heterogeneous data fusion and Modifying model process management and control method
CN109947746A (en) * 2017-10-26 2019-06-28 亿阳信通股份有限公司 A kind of quality of data management-control method and system based on ETL process
CN110765196A (en) * 2019-10-25 2020-02-07 四川东方网力科技有限公司 Method and equipment for generating and executing ETL task
CN111324647A (en) * 2020-01-21 2020-06-23 北京东方金信科技有限公司 Method and device for generating ETL code
CN111930357A (en) * 2020-09-17 2020-11-13 国网浙江省电力有限公司营销服务中心 Construction method of visual modeling job flow scheduling engine
CN112115192A (en) * 2020-10-09 2020-12-22 北京东方通软件有限公司 Efficient flow arrangement method and system for ETL system
CN113111107A (en) * 2021-04-06 2021-07-13 创意信息技术股份有限公司 Data comprehensive access system and method
CN113111106A (en) * 2021-04-06 2021-07-13 创意信息技术股份有限公司 ETL design data access method and data access module based on Web

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101388844A (en) * 2008-11-07 2009-03-18 东软集团股份有限公司 Data flow processing method and system
CN102033748A (en) * 2010-12-03 2011-04-27 中国科学院软件研究所 Method for generating data processing flow codes
US20120296862A1 (en) * 2011-05-19 2012-11-22 Compact Solutions, Llc Method and apparatus for analyzing and migrating data integration applications
CN103309904A (en) * 2012-03-16 2013-09-18 阿里巴巴集团控股有限公司 Method and device for generating data warehouse ETL (Extraction, Transformation and Loading) codes
CN104778236A (en) * 2015-04-02 2015-07-15 上海烟草集团有限责任公司 ETL (Extract-Transform-Load) realization method and system based on metadata
US20170060969A1 (en) * 2015-09-02 2017-03-02 International Business Machines Corporation Automating extract, transform, and load job testing
CN106874016A (en) * 2017-03-07 2017-06-20 长江大学 A kind of new customizable big data platform architecture method
CN109947746A (en) * 2017-10-26 2019-06-28 亿阳信通股份有限公司 A kind of quality of data management-control method and system based on ETL process
CN109492059A (en) * 2019-01-03 2019-03-19 北京理工大学 A kind of multi-source heterogeneous data fusion and Modifying model process management and control method
CN110765196A (en) * 2019-10-25 2020-02-07 四川东方网力科技有限公司 Method and equipment for generating and executing ETL task
CN111324647A (en) * 2020-01-21 2020-06-23 北京东方金信科技有限公司 Method and device for generating ETL code
CN111930357A (en) * 2020-09-17 2020-11-13 国网浙江省电力有限公司营销服务中心 Construction method of visual modeling job flow scheduling engine
CN112115192A (en) * 2020-10-09 2020-12-22 北京东方通软件有限公司 Efficient flow arrangement method and system for ETL system
CN113111107A (en) * 2021-04-06 2021-07-13 创意信息技术股份有限公司 Data comprehensive access system and method
CN113111106A (en) * 2021-04-06 2021-07-13 创意信息技术股份有限公司 ETL design data access method and data access module based on Web

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
XIAOLIANG LI 等: "The Research and Application of an ETL Model Based on Task", 《2009 FIRST INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND ENGINEERING》 *
楚静: "基于商务智能的景区决策支持系统研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115936633A (en) * 2023-01-09 2023-04-07 广东远景信息科技有限公司 Emergency flow linking method, electronic equipment and storage medium
CN115936633B (en) * 2023-01-09 2023-11-03 广东远景信息科技有限公司 Emergency flow connection method, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN114880385B (en) 2022-11-22

Similar Documents

Publication Publication Date Title
US11068439B2 (en) Unsupervised method for enriching RDF data sources from denormalized data
US10102039B2 (en) Converting a hybrid flow
CA2973234C (en) System and method for querying data sources
CN105335403B (en) Database access method and device and database system
CN109815283B (en) Heterogeneous data source visual query method
EP2924588A1 (en) Report creation method, device and system
CN106570022B (en) Cross-data-source query method, device and system
CN104933095A (en) Heterogeneous information universality correlation analysis system and analysis method thereof
CN109857803B (en) Data synchronization method, device, equipment, system and computer readable storage medium
CN108197091B (en) Method, system and related equipment for creating data table
CN104111958A (en) Data query method and device
US11461333B2 (en) Vertical union of feature-based datasets
CN108197187B (en) Query statement optimization method and device, storage medium and computer equipment
WO2017096155A1 (en) Methods and systems for mapping object oriented/functional languages to database languages
CN104166701A (en) Machine learning method and system
CN114880385B (en) Method and device for accessing geological disaster data through automatic combination process
CN111125199B (en) Database access method and device and electronic equipment
CN115392501A (en) Data acquisition method and device, electronic equipment and storage medium
CN113468571B (en) Source tracing method based on block chain
CN111475165A (en) Intelligent compiling method, system, terminal and storage medium for application program
CN111159213A (en) Data query method, device, system and storage medium
CN103324640B (en) A kind of method, device and equipment determining search result document
CN114254005A (en) Grouping aggregation query method and device for partition table, computer equipment and medium
CN108089871A (en) Automatic updating method of software, device, equipment and storage medium
CN112395306A (en) Database system, data processing method, data processing device and computer storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant