CN115210701A - System and method for automatically generating BI models using data introspection and curation - Google Patents

System and method for automatically generating BI models using data introspection and curation Download PDF

Info

Publication number
CN115210701A
CN115210701A CN202180015587.1A CN202180015587A CN115210701A CN 115210701 A CN115210701 A CN 115210701A CN 202180015587 A CN202180015587 A CN 202180015587A CN 115210701 A CN115210701 A CN 115210701A
Authority
CN
China
Prior art keywords
data
environment
model
customer
enterprise
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202180015587.1A
Other languages
Chinese (zh)
Inventor
S·弗玛
B·克里斯南
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Oracle International Corp
Original Assignee
Oracle International Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US16/868,081 external-priority patent/US20200356575A1/en
Application filed by Oracle International Corp filed Critical Oracle International Corp
Publication of CN115210701A publication Critical patent/CN115210701A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/258Data format conversion from or to a database

Abstract

In accordance with embodiments, described herein are systems and methods for automatically generating Business Intelligence (BI) data models using data introspection and planning as may be used, for example, with Enterprise Resource Planning (ERP) or other enterprise computing or data analysis environments. The described method uses a combination of manually planned workpieces and a data introspection auto-generated model through the source data environment to arrive at a target BI data model. For example, the pipeline generator framework may evaluate the dimensions, degradation properties, and application metrics of the transaction type; and uses the output of this process to create an output target model and pipeline or loading plan. The systems and methods described herein provide technical improvements in building new subject areas or BI data models in a much shorter period of time.

Description

System and method for automatically generating BI models using data introspection and curation
Copyright notice
A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the patent and trademark office patent file or records, but otherwise reserves all copyright rights whatsoever.
The priority claim:
the present application claims U.S. provisional patent application entitled "SYSTEM AND METHOD FOR AUTOMATIC GENERATION OF BI MODELS USING DATA integration AND calculation" filed on 20/2/2020, U.S. provisional patent application No. 62/979,269; priority rights in U.S. patent application Ser. No. 16/868,081 entitled "SYSTEM AND METHOD FOR custom modification IN AN ANALYTIC application FOR Environment", filed on 6/5/2020; each of these applications is incorporated herein by reference.
Technical Field
Embodiments described herein relate generally to computer data analysis, business Intelligence (BI), and Enterprise Resource Planning (ERP) or other enterprise computing environments, and in particular, to systems and methods for automatically generating BI data models using data introspection and curation for use in such environments.
Background
Data analysis enables examination of large volumes of data in order to draw conclusions or other information from the data; while Business Intelligence (BI) tools provide business users with information describing the data in a format that enables them to make strategic business decisions.
There is increasing interest in developing in the context of Enterprise Resource Planning (ERP) or other enterprise computing environments for organizations; or software applications that exploit the use of data analysis in the context of a software as a service (SaaS) or cloud environment. However, conventional methods of preparing BI data models have been less successful in dealing with the complex patterns used in modern enterprise computing environments.
Disclosure of Invention
In accordance with embodiments, described herein are systems and methods for automatically generating Business Intelligence (BI) data models using data introspection and planning as may be used, for example, with Enterprise Resource Planning (ERP) or other enterprise computing or data analysis environments. The described method uses a combination of manually planned workpieces and automatically generating models through data introspection of the source data environment to derive a target BI data model. For example, the pipeline generator framework may evaluate the dimensions, degradation properties, and application metrics of the transaction type; and uses the output of this process to create an output target model and pipeline or loading plan. The systems and methods described herein provide technical improvements in building new subject areas or BI data models in a much shorter period of time.
Drawings
FIG. 1 illustrates a system for providing an analysis application environment, according to an embodiment.
FIG. 2 further illustrates a system for providing an analysis application environment, according to an embodiment.
FIG. 3 further illustrates a system for providing an analysis application environment, according to an embodiment.
FIG. 4 further illustrates a system for providing an analysis application environment, according to an embodiment.
FIG. 5 further illustrates a system for providing an analysis application environment, according to an embodiment.
FIG. 6 further illustrates a system for providing an analysis application environment, according to an embodiment.
FIG. 7 further illustrates a system for providing an analysis application environment, according to an embodiment.
FIG. 8 further illustrates a system for providing an analysis application environment, according to an embodiment.
FIG. 9 further illustrates a system for providing an analysis application environment, according to an embodiment.
FIG. 10 illustrates a flow diagram of a method for providing an analysis application environment, according to an embodiment.
FIG. 11 illustrates an extensibility and customization enabled analysis application environment, according to an embodiment.
FIG. 12 illustrates a self-service (self-service) data model in accordance with an embodiment.
FIG. 13 illustrates a curated data model according to an embodiment.
FIG. 14 illustrates a system for automatically generating a BI data model using data introspection and curation, according to an embodiment.
FIG. 15 further illustrates a system for automatically generating a BI data model using data introspection and curation, according to an embodiment.
FIG. 16 illustrates an example pipeline generator framework for use in automatic generation of BI data models in accordance with embodiments.
FIG. 17 illustrates an example flow diagram for a process used in the automatic generation of a BI data model in accordance with an embodiment.
FIG. 18 further illustrates an example flowchart of a process for use in automatic generation of a BI data model according to embodiments.
FIG. 19 illustrates an example list of transaction types according to an embodiment.
FIG. 20 illustrates an example transaction column list in accordance with an embodiment.
FIG. 21 illustrates an example dimension to logical dimension mapping in accordance with embodiments.
FIG. 22 illustrates an example physical-to-logical attribute mapping, according to an embodiment.
Fig. 23 illustrates an example physical-to-logical measurement mapping, according to an embodiment.
FIG. 24 illustrates an example user interface for use with a system for automatically generating a BI data model in accordance with embodiments.
FIG. 25 illustrates an example user interface for use with a system for automatically generating a BI data model in accordance with embodiments.
FIG. 26 illustrates an example user interface for use with a system for automatically generating a BI data model in accordance with embodiments.
FIG. 27 illustrates an example user interface for use with a system for automatically generating a BI data model in accordance with embodiments.
FIG. 28 illustrates an example user interface for use with a system for automatically generating a BI data model in accordance with embodiments.
FIG. 29 illustrates an example user interface for use with a system for automatically generating a BI data model in accordance with embodiments.
FIG. 30 illustrates an example user interface for use with a system for automatically generating a BI data model in accordance with embodiments.
FIG. 31 illustrates an example user interface for use with a system for automatically generating a BI data model in accordance with embodiments.
FIG. 32 illustrates a flowchart of a method of providing automatic generation of BI data models using data introspection and curation, according to an embodiment.
Detailed Description
As noted above, within an organization, data analysis enables computer-based inspection or analysis of large amounts of data to draw conclusions or other information from the data; business intelligence tools provide business users of an organization with information describing their enterprise data in a format that enables the business users to make strategic business decisions.
There is an increasing interest in developing in the context of an organizational enterprise software application or data environment, such as, for example, an Oracle converged application environment or other type of enterprise software application or data environment; or software applications that exploit the use of data analysis in the context of a software as a service (SaaS) or cloud environment, such as, for example, an Oracle analysis cloud or Oracle cloud infrastructure environment or other type of cloud environment.
According to an embodiment, an analysis application environment enables data analysis in the context of an organization's enterprise software application or data environment, or software as a service or other type of cloud environment; and supports the development of computer-executable software analysis applications.
According to embodiments, a data pipeline or process (such as, for example, an extraction, transformation, loading process) may operate according to an analysis application schema suitable for resolving a particular analysis use case or best practices to receive data from an enterprise software application or data environment of a customer (tenant) for loading into a data warehouse instance.
According to embodiments, each customer (tenant) may additionally be associated with a customer lease and a customer pattern. A data pipeline or process populates its data warehouse instances and database tables with data as received from its enterprise software applications or data environments, as defined by a combination of analysis application patterns and their client patterns.
Technical advantages of the described systems and methods include, according to embodiments, the use of a system-wide or shared analytics application schema or data model maintained within an analytics application environment (cloud) lease; and a tenant-specific customer schema maintained within the customer lease; such that each customer's (tenant) data warehouse instance or database table can be automatically or periodically (e.g., hourly/daily/weekly or otherwise) populated or otherwise associated with real-time data (real-time table) as received from its enterprise software application or data environment and reflect best practices for a particular analysis use case. Examples of such analysis use cases may include Enterprise Resource Planning (ERP), human Capital Management (HCM), customer experience (CX), supply Chain Management (SCM), enterprise Performance Management (EPM), or other types of analysis use cases. The populated data warehouse instances or database tables may then be used to create a computer-executable software analysis application, or to determine data analysis or other information associated with the data.
According to embodiments, a computer-executable software analysis application may be associated with a data pipeline or process (such as, for example, an extract, transform, load (ETL) process, or an extract, load, transform (ELT) process) maintained by a data integration component (such as, for example, an Oracle Data Integrator (ODI) environment or other type of data integration component).
According to embodiments, the analytics application environment may operate with a data warehouse environment or component, such as, for example, an Oracle Autonomous Data Warehouse (ADW), an Oracle Autonomous Data Warehouse Cloud (ADWC), or other type of data warehouse environment or component suitable for storing large amounts of data; these data warehouse environments or components may be populated via a star schema that originates from an enterprise software application or data environment, such as, for example, an Oracle fusion application or other type of enterprise software application or data environment. Data available to each customer (tenant) analyzing the application environment may be provisioned in, for example, ADWC leases associated with and accessible only to that customer (tenant); while providing access to other features of the shared infrastructure.
For example, according to an embodiment, an analytics application environment may include a data pipeline or processing layer that enables customers (tenants) to ingest data extracted from their Oracle converged application environment to be loaded into their data warehouse instances within their ADWC tenants, including supporting features such as multiple data warehouse schemas, data extraction and targeting schemas, and monitoring data pipelines or processing stages; with a shared data pipeline or processing infrastructure that provides a generic transformation graph or repository.
Introduction to
According to an embodiment, a data warehouse environment or component, such as, for example, an Oracle Autonomous Data Warehouse (ADW), an Oracle Autonomous Data Warehouse Cloud (ADWC), or other type of data warehouse environment or component suitable for storing large amounts of data, may provide a central repository for storing data collected by one or more business applications.
For example, in accordance with an embodiment, a data warehouse environment or component may be provided as a multidimensional database that employs online analytical processing (OLAP) or other techniques to generate business-related data from a plurality of different data sources. An organization may extract such business-related data from one or more vertical and/or horizontal business applications, and inject the extracted data into a data warehouse instance associated with the organization,
examples of horizontal business applications may include ERP, HCM, CX, SCM, and EPM as described above, and provide a wide range of functionality across various enterprise organizations.
Vertical business applications are generally narrower in scope than horizontal business applications, but provide access to data further up and down stream in a data chain within a defined scope or industry. Examples of vertical business applications may include medical software or banking software for use within a particular organization.
While software vendors are increasingly offering enterprise software products or components as SaaS or cloud-oriented products, such as, for example, oracle fusion applications; other enterprise software products or components (such as, for example, oracle ADWC) may be provisioned as one or more of SaaS, platform as a service (PaaS), or hybrid subscriptions; enterprise users of conventional Business Intelligence (BI) applications and processes are often faced with the task of extracting data from their horizontal and vertical business applications and introducing the extracted data into a data warehouse-a process that can be both time-consuming and resource-consuming.
According to embodiments, the analytics application environment allows a customer (tenant) to develop a computer-executable software analytics application for use with a BI component, such as, for example, an Oracle Business Intelligence Application (OBIA) environment or other type of BI component suitable for inspecting large amounts of data originating from the customer (tenant) itself or from multiple third party entities.
For example, according to an embodiment, when used with a SaaS business productivity software product suite that includes a data warehouse component, the analytics application environment may be used to populate the data warehouse component with data from the suite's business productivity software application. The predefined data set flow may automate the processing of ETL data between the business productivity software application and the data warehouse, which otherwise might be performed in a conventional manner or manually by users of those services.
As another example, according to an embodiment, the analytics application environment may be preconfigured with a database schema for storing consolidated data for various business productivity software applications originating from the SaaS product suite. This pre-configured database schema can be used to provide uniformity between productivity software applications provisioned in a SaaS product suite and a corresponding transaction database; while allowing users to forgo the process of manually designing, adjusting, and modeling the data warehouse provided.
As another example, according to an embodiment, an analytics application environment may be used to pre-populate a reporting interface of a data warehouse instance with relevant metadata describing data objects related to a business in the context of various business productivity software applications, for example, to include predefined dashboards (dashboards), key Performance Indicators (KPIs), or other types of reports.
Analyzing application environments
FIG. 1 illustrates a system for providing an analysis application environment, according to an embodiment.
As shown in fig. 1, according to embodiments, the analytics application environment 100 may be provided by, or otherwise operated at, a computer system having computer hardware (e.g., processor, memory) 101 and including one or more software components operating as a control plane 102 and a data plane 104, and providing access to a data warehouse or data warehouse instance 160.
According to an embodiment, the components and processes shown in fig. 1 and described further herein with respect to various other embodiments may be provided as software or program code executable by a computer system or other type of processing device.
For example, according to embodiments, the components and processes described herein may be provided by a cloud computing system or other suitably programmed computer system.
According to embodiments, the control plane operates to provide control of a cloud or other software product provisioned in the context of a SaaS or cloud environment (such as, for example, an Oracle analytics cloud or Oracle cloud infrastructure environment or other type of cloud environment).
For example, according to an embodiment, the control plane may include a console interface 110, the console interface 110 enabling a client computer device 10 having device hardware 12, management application 14, and user interface 16 to access under the control of a customer (tenant) 20 and/or a cloud environment having a provisioning component 111.
According to an embodiment, the console interface may enable access by a customer (tenant) operating a Graphical User Interface (GUI) and/or a Command Line Interface (CLI) or other interface; and/or may include an interface for use by the provider of the SaaS or cloud environment and its customers (tenants).
For example, according to an embodiment, a console interface may provide an interface that allows customers to offer services for use within their SaaS environment and to configure those services that have been offered.
According to embodiments, the provisioning component may include various functionality for provisioning the service specified by the provisioning command.
For example, according to an embodiment, a customer (tenant) can access and utilize a provisioning component via a console interface to purchase one or more of a suite of business productivity software applications and data warehouse instances for use with those software applications.
According to an embodiment, a customer (tenant) may request that a customer schema 164 be provisioned within a data warehouse. The customer may also provision, via the console interface, a plurality of attributes associated with the data warehouse instance, including a desired attribute (e.g., login credentials) and an optional attribute (e.g., size or speed). The provisioning component may then provision the requested data warehouse instance, including the customer schema of the data warehouse; and populate the data warehouse instance with the appropriate information supplied by the customer.
According to embodiments, the provisioning component may also be used to update or edit data warehouse instances and/or ETL processes operating at the data plane, for example, by changing or updating the request frequency of ETL process runs (ETL process run) for a particular customer (tenant).
According to an embodiment, the provisioning component may also include a provisioning Application Programming Interface (API) 112, a plurality of workers 115, a metering manager 116, and a data plane API 118, as further described below. When commands, instructions, or other input is received at the console interface, the console interface can communicate with a provisioning API, such as by making API calls, to provision services in a SaaS environment, or make configuration changes to provisioned services.
According to an embodiment, the data plane API may communicate with the data plane.
For example, according to an embodiment, provisioning and configuration changes for services provided by the data plane may be communicated to the data plane via the data plane API.
According to embodiments, the metering manager may include various functions to meter services offered through the control plane and usage of the services.
For example, according to an embodiment, a metering manager may record usage over time of processors provisioned via a control plane for a particular customer (tenant) for billing purposes. Also, for billing purposes, the metering manager may record the amount of storage space of the data warehouse that is partitioned for use by customers of the SaaS environment.
According to an embodiment, the data plane may include a data pipeline or processing layer 120 and a data transformation layer 134 that together process operational or transactional data from an organization's enterprise software application or data environment, such as, for example, a business productivity software application provisioned in a SaaS environment of a customer (tenant). The data pipeline or process may include various functions that may extract transactional data from business applications and databases provisioned in the SaaS environment and then load the transformed data into a data repository.
According to an embodiment, the data transformation layer may include a data model, such as, for example, a Knowledge Model (KM) or other type of data model, that the system uses to transform the transaction data received from the business applications and corresponding transaction databases provisioned in the SaaS environment into a model format understood by the analytics application environment. The model format may be provided in any data format suitable for storage in a data warehouse.
According to an embodiment, the data pipeline or processing provided by the data plane may include a monitoring component 122, a data staging component 124, a data quality component 126, and a data projection component 128, as described further below.
According to an embodiment, the data transformation layer may include a dimension generation component 136, a fact generation component 138, and an aggregation generation component 140, as described further below. The data plane may also include a data and configuration user interface 130, and a mapping and configuration database 132.
According to embodiments, the data warehouse may include a default analytics application schema (referred to herein as analytics warehouse schema according to some embodiments) 162 and, for each customer (tenant) of the system, a customer schema, as described above.
According to an embodiment, the data plane is responsible for performing extract, transform and load (ETL) operations, including extracting transactional data from an organization's enterprise software application or data environment (such as, for example, a business productivity software application and corresponding transactional database provisioned in a SaaS environment), transforming the extracted data into a model format, and loading the transformed data into a customer schema of the data warehouse.
For example, according to an embodiment, each customer (tenant) of the environment may be associated with its own customer lease within the database repository, which is associated with its own customer schema; and may additionally be provided with read-only access to the analysis application pattern, which may be updated periodically by the data pipeline or process (e.g., ETL process) or based on other means.
In accordance with an embodiment, to support multiple tenants, the system may enable the use of multiple data warehouses or data warehouse instances.
For example, according to an embodiment, a first warehouse customer lease for a first tenant may include a first database instance, a first staging area, and a first data warehouse or a first data warehouse instance of a plurality of data warehouses; and a second customer lease for a second tenant may include a second database instance, a second staging area, and a second data warehouse or a second data warehouse instance of the plurality of data warehouses.
According to embodiments, data pipelining or processing may be scheduled to be performed at intervals (e.g., hourly/daily/weekly) to extract transactional data from enterprise software applications or data environments, such as, for example, business productivity software applications and corresponding transactional databases 106 provisioned in SaaS environments.
According to embodiments, the fetch process 108 may fetch the transaction data such that the fetch data pipeline or process may insert the fetched data into a data staging area, which may serve as a temporary staging area for the fetched data. A data quality component and a data protection component can be used to ensure the integrity of the extracted data.
For example, according to an embodiment, the data quality component may perform validation on the extracted data while the data is temporarily held in the data staging area.
According to an embodiment, when the extraction process has completed its extraction, the data transformation layer may be used to begin a transformation process to transform the extracted data into a model format for loading into a client schema of the data warehouse.
As described above, according to embodiments, a data pipeline or process may operate in conjunction with a data transformation layer to transform data into a model format. The mapping and configuration database may store metadata and data maps that define the data model used by the data transformation. A data and configuration User Interface (UI) may facilitate access to and modification of the mapping and configuration database.
According to an embodiment, the monitoring component may determine the dependencies of several different data sets to be transformed based on the data model defined in the mapping and configuration database. Based on the determined dependencies, the monitoring component can determine which of several different data sets should be first transformed into the model format.
For example, according to an embodiment, if a first model dataset does not comprise dependencies on any other model datasets; and the second model data set comprises dependencies on the first model data set; the monitoring component may determine to transform the first data set prior to transforming the second data set to accommodate the dependency of the second data set on the first data set.
According to an embodiment, the data transformation layer may transform the extracted data into a format suitable for loading into a client schema of the data warehouse, e.g., according to a data model as described above. During transformation, the data transformation may perform dimension generation, fact generation, and aggregation generation as appropriate. Dimension generation may include generating dimensions or fields for loading into a data warehouse instance.
For example, according to an embodiment, a dimension may include a data category such as, for example, "name," address, "or" age. Fact generation includes the generation of values or "metrics" that the data may take. Facts are associated with appropriate dimensions in the data warehouse instance. Aggregation generation includes creating a data map that computes an aggregation of transformed data to existing data in the customer schema 164 of the data warehouse instance.
According to embodiments, once any transformations are in place (as defined by the data model), the data pipeline or process may read the source data, apply the transformations, and then push the data to the data warehouse instance.
According to embodiments, the data transformation may be expressed in rules and once the transformation occurs, values may be intermediately maintained at a staging area where the data quality component and the data projection component may verify and check the integrity of the transformed data before the data is uploaded to the customer mode at the data warehouse instance. Monitoring may be provided at the extraction, transformation, loading process runtime, for example, at many compute instances or virtual machines. Dependencies may also be maintained during the extraction, transformation, loading processes, and data pipelining or processing may participate in such ordering decisions.
According to embodiments, after transforming the extracted data, the data pipeline or process may perform a warehouse loading process 150 to load the transformed data into a customer schema of the data warehouse instance. After the transformed data is loaded into the customer schema, the transformed data can be analyzed and used in various additional business intelligence processes.
Horizontally and vertically integrated business software applications typically involve capturing data in real time. This is a result of the fact that horizontally and vertically integrated business software applications are typically used for daily workflow and store data in a transactional database, which means that typically only the most up-to-date data is stored in such databases.
For example, while an HCM application may update records associated with an employee when the employee commits to an office, such HCM applications typically do not maintain records for each office at which the employee worked during the company's duties. Thus, BI-related queries that attempt to determine employee liquidity within a company will not have enough records in the transaction database to complete such queries.
According to embodiments, by storing historical data in addition to current data generated by horizontally and vertically integrated business software applications, in a context that is readily understood by BI applications, data warehouse instances, as populated using the techniques described above, provide resources for BI applications to process such queries using interfaces provided, for example, by business productivity and analytical product suites or by SQL tools selected by customers.
Data pipeline processing
FIG. 2 further illustrates a system for providing an analysis application environment, according to an embodiment.
As shown in fig. 2, according to an embodiment, data may originate from, for example, an enterprise software application or data environment (106) of a customer (tenant), using data pipeline processing as described above; or as custom data 109 originating from one or more client-specific applications 107; and loaded into a data warehouse instance, including in some examples using object storage 105 to store data.
According to embodiments, a data pipeline or process maintains an analysis application schema for each customer (tenant), e.g., as a star schema, which is updated by the system periodically or otherwise based on best practices for a particular analysis use case, e.g., human Capital Management (HCM) analysis or Enterprise Resource Planning (ERP) analysis.
According to embodiments, for each customer (tenant), the system pre-populates the customer's data warehouse instance within the analysis application environment (cloud) tenancy 114 using analysis application patterns maintained and updated by the system based on analysis of data within the customer's enterprise application environment and within the customer tenancy 117. Thus, the analytic application mode of system maintenance enables data to be retrieved from a customer's environment through a data pipeline or process and loaded into the customer's data warehouse instance in a "real-time" manner.
According to embodiments, the analytics application environment also provides each customer of the environment with a customer schema that is easily modified by the customer and allows the customer to replenish and utilize data within its own data warehouse instance. For each client of the analytical application environment, its result data warehouse instance operates as a database whose content is partially controlled by the client; and is controlled in part by the analysis application environment (system); including their database representations, are pre-populated with appropriate data that has been retrieved from their enterprise application environment to address various analysis use cases, such as HCM analysis or ERP analysis.
For example, according to an embodiment, a data warehouse (e.g., oracle autonomic data warehouse, ADWC) may include an analytics application schema and, for each customer/tenant, a customer schema that originates from its enterprise software application or data environment. Data supplied in a data warehouse lease (e.g., ADWC lease) is accessible only to that tenant; while at the same time allowing access to various features of the shared analytics application environment, such as ETL-related features or other features.
According to an embodiment, to support multiple customers/tenants, the system enables the use of multiple data warehouse instances; wherein, for example, the first customer lease may include a first database instance, a first staging area, and a first data warehouse instance; and the second customer lease may include a second database instance, a second staging area, and a second data warehouse instance.
According to embodiments, for a particular customer/tenant, upon extracting its data, the data pipeline or process may insert the extracted data into the tenant's data staging area, which may serve as a temporary staging area for the extracted data. Data quality components and data protection components can be used to ensure the integrity of the extracted data; for example, by performing verification on the extracted data while the data is temporarily held in the data staging area. When the extraction process completes its extraction, a transformation process may be initiated using the data transformation layer to transform the extracted data into a model format for loading into the client schema of the data warehouse.
Extract, transform, load/publish
FIG. 3 further illustrates a system for providing an analysis application environment, according to an embodiment.
As shown in FIG. 3, according to an embodiment, the process of extracting data from an enterprise software application or data environment, such as a customer (tenant), is processed using a data pipeline as described above; or as custom data originating from one or more client-specific applications; and loading data into a data warehouse instance or refreshing data in a data warehouse typically involves three main stages performed by the ETP service 160 or process (including one or more extraction services 163 performed by one or more compute instances 170; transformation service 165; load/publish service 167).
Extraction: according to an embodiment, the list of view objects for extraction may be submitted, for example, via a ReST call, to, for example, an Oracle BI cloud connector (BICC) component. The extracted files may be uploaded to an object storage component, such as, for example, an Oracle Storage Service (OSS) component, for storing data.
And (3) transformation: according to an embodiment, the transformation process takes a data file from an object store component (e.g., an OSS) and applies business logic while loading it into a target data warehouse (e.g., ADWC database) that is internal to the data pipeline or process and not exposed to customers (tenants).
Loading/publishing: according to an embodiment, a load/publish service or process obtains data from, for example, an ADWC database or repository and publishes it to a data repository instance accessible by a customer (tenant).
Multiple clients (tenants)
FIG. 4 further illustrates a system for providing an analysis application environment, according to an embodiment.
As shown in fig. 4, which illustrates the operation of a system having multiple tenants (customers) according to an embodiment, using data pipeline processing as described above, data may originate from each of, for example, multiple customer (tenant) enterprise software applications or data environments; and loaded to the data warehouse instance.
According to an embodiment, for each of a plurality of customers (tenants) (e.g., customer a 180, customer B182), the data pipeline or process maintains an analysis application schema that is updated periodically or otherwise based by the system according to best practices for a particular analysis use case.
According to an embodiment, for each of a plurality of customers (e.g., customer A, B), based on an analysis of data in that customer's enterprise application environment 106A, 106B and in each customer's tenancy (e.g., customer A tenancy 181, customer B tenancy 183), the system pre-populates the customer's data warehouse instance with analysis application patterns 162A, 162B maintained and updated by the system; such that data is retrieved from the customer's environment through the data pipeline or process and loaded into the customer's data warehouse instance 160A, 160B.
According to embodiments, the analytics application environment also provides each of a plurality of clients of the environment with a client schema (e.g., client a schema 164A, client B schema 164B) that is easily modified by the client and that allows the client to supplement and utilize data in its own data warehouse instance.
As described above, according to an embodiment, for each of a plurality of clients of an analysis application environment, its result data warehouse instance operates as a database whose content portions are controlled by the client; and is controlled in part by the analysis application environment (system); including their database representations, are pre-populated with appropriate data that has been retrieved from their enterprise application environment to address various analysis use cases. After the extraction process 108A, 108B for a particular client completes its extraction, a transformation process may begin using the data transformation layer to transform the extracted data into a model format for loading into the client schema of the data warehouse.
Activation plan
FIG. 5 further illustrates a system for providing an analytics application environment, in accordance with an embodiment.
According to an embodiment, the activation plan 186 may be used to control the operation of a data pipeline or processing service for a customer of a particular functional area to address the particular needs of that customer (tenant).
For example, according to an embodiment, an activation plan may define a number of fetch, transform, and load (publish) services or steps that run in a particular order, at a particular time of day, and within a particular time window.
According to an embodiment, each customer may be associated with its own activation plan(s). For example, the activation plan of the first client A may determine tables to be retrieved from the client's enterprise software application environment (e.g., its converged application environment), or how to run the services and their processing in order; while the activation plan of the second client B may likewise determine the tables to retrieve from the client's enterprise software application environment or determine how to run the services and their processing in sequence.
According to embodiments, the activation plan may be stored in a mapping and configuration database and may be customized by the customer via a data and configuration UI. There may be multiple activation plans per customer. The compute instance/service (virtual machine) performing ETL processing for various customers according to its activation plan may be dedicated to a particular service to use the activation plan, and then released for use by other services and activation plans.
According to embodiments, based on a determination of historical performance data recorded over a period of time, the system may optimize execution of an activation plan, for example, for one or more functional areas associated with a particular tenant, or across a series of activation plans associated with multiple tenants, to address utilization of VMs and Service Level Agreements (SLAs) by those tenants. Such historical data may include statistics of load amounts and load times.
For example, according to an embodiment, the historical data may include an extract size, an extract count, an extract time, a warehouse size, a transformation time, a publish (load) time, a view object extract size, a view object extract record count, a view object extract time, a warehouse table count, a record count for table processing, a warehouse table transformation time, a publish table count, and a publish time. Such historical data may be used to estimate and plan current and future activation plans in order to organize various tasks, such as, for example, running in sequence or parallel to arrive at a minimum time to run an activation plan. Further, the collected historical data can be used for optimization across multiple activation plans of tenants. In some embodiments, optimization of the activation plan (i.e., a particular job sequence, such as ETL) based on historical data may be automatic.
ETL Process flow
FIG. 6 further illustrates a system for providing an analysis application environment, according to an embodiment.
As shown in fig. 6, according to an embodiment, the system enables data controlled by the data configuration/management/ETL// status service 190 within a tenancy managed by (e.g., oracle) to flow from each customer's enterprise software application environment (e.g., its converged application environment), including BICC components in this example, via the storage cloud services 192 (e.g., OSS), and from there to the data warehouse instances.
As described above, according to embodiments, the data stream may be managed by one or more services, including for example, extraction and transformation services as described above, and referencing the ETL repository 193, obtaining data from the storage cloud services and loading the data into an internal target data warehouse (e.g., ADWC database) 194 that is internal to the data pipeline or process and not exposed to customers.
According to an embodiment, data is moved in stages into the data warehouse and then into the database table change log 195, from where the load/publish service may load customer data into the target data warehouse instance that is associated with and accessible to the customer within its customer lease.
ETL phase
FIG. 7 further illustrates a system for providing an analytics application environment, in accordance with an embodiment.
According to embodiments, extracting, transforming, and loading data from enterprise applications to data warehouse instances involves multiple phases, and each phase may have several sequential or parallel jobs; and run on different spaces/hardware, including different staging areas 196, 198 for each customer.
Analyzing application environment metrics
FIG. 8 further illustrates a system for providing an analysis application environment, according to an embodiment.
As shown in fig. 8, according to an embodiment, the metering manager may include functionality to meter services offered through the control plane and usage of the services and provide provisioned metrics 142.
For example, for billing purposes, the metering manager may record usage of processors provisioned for a particular customer via the control plane over a period of time. Also, for billing purposes, the metering manager may record the amount of storage space of the data warehouse that is partitioned for use by customers of the SaaS environment.
Analyzing application environment customizations
FIG. 9 further illustrates a system for providing an analysis application environment, according to an embodiment.
As shown in FIG. 9, in accordance with an embodiment, in addition to data originating from, for example, a customer's enterprise software application or data environment, which may be processed using a data pipeline as described above, one or more additional custom data 109A, 109B originating from one or more customer-specific applications 107A, 107B may also be extracted, transformed, and loaded into a data warehouse instance using any of the following: data pipeline processing as described above, in some examples including using object storage to store data; and/or custom ETL or other processing 144 that is variable from the customer's perspective. Once the data is loaded into its data warehouse instance, the customer can create a business database view that merges the tables from its customer schema and software analysis application schema; and can query its data warehouse instances using, for example, an interface provided by the business productivity and analytical product suite or by a custom-selected SQL tool.
Method for analyzing application environment
FIG. 10 illustrates a flow diagram of a method for providing an analytic application environment, according to an embodiment.
As shown in fig. 10, according to an embodiment, at step 200, the analytics application environment provides access to a data repository for multiple tenants to store data, wherein the data repository is associated with an analytics application schema.
At step 202, each tenant of the plurality of tenants is associated with a customer lease and a customer schema used by the tenant in populating the data warehouse instance.
At step 204, an instance of the data warehouse is populated with data received from the enterprise software application or the data environment, wherein data associated with a particular tenant of the analytics application environment is provisioned in the instance of the data warehouse associated with and accessible by the particular tenant according to the analytics application schema and a customer schema associated with the particular tenant.
Extensibility and customization
Different customers of a data analysis environment may have different requirements on how to classify, aggregate, or transform their data for the purpose of providing data analysis or business intelligence data or developing software analysis applications.
In accordance with an embodiment, to support such different requirements, the system may include a semantic layer that enables the use of custom semantic extensions to extend the semantic data model (semantic model) and provide custom content at the presentation layer. An extension wizard or development environment may guide a user to extend or customize a semantic model using custom semantic extensions by defining branches and steps and then populating the extended or customized semantic model to a production environment.
Technical advantages of the described methods include support for additional types of data sources, according to various embodiments. For example, a user may be based on a combination of ERP data originating from a first vendor product and HCM data originating from a second, different vendor product; or perform data analysis based on a combination of data received from multiple data sources having different regulatory requirements. The user-defined extensions or customizations may be subject to fixes, updates, or other changes to the underlying system.
FIG. 11 illustrates a system for supporting extensibility and customization in an analytics application environment, in accordance with an embodiment.
According to an embodiment, the semantic layer may include data defining a semantic model of the customer data; this is useful in helping users understand and use commonly understood business terms to access the data. The semantic layer may include a physical layer mapped to a physical data model or data plane; a logical layer operating as a mapping or transformation layer in which computations may be defined; and a presentation layer enabling a user to access data as content.
As shown in fig. 11, according to an embodiment, the semantic layer 230 may include a packed (out-of-the-box), initial) semantic model 232 that may be used to provide the packed content 234. For example, the system may load data from a customer's enterprise software application or data environment into a data warehouse instance using an ETL or other data pipeline or process as described above, where the packaged semantic model may be used to provide packaged content to a presentation layer.
According to an embodiment, the semantic layer may also be associated with one or more semantic extensions 236 that may be used to extend the packaged semantic model and provide custom content 238 to the presentation layer 240.
According to an embodiment, the presentation layer may enable analysis of the application using, for example, software, a user interface, a dashboard, key Performance Indicators (KPIs) 242; or other types of reports or interfaces as might be provided by products such as, for example, the Oracle analytics cloud or Oracle application analytics (Oracle analytics for Applications).
According to embodiments, in addition to data originating from the customer environment using the ETL or other data pipeline or process described above, customer data may be loaded to the data warehouse instances using various data models or scenarios that provide further expansion and customization opportunities.
FIG. 12 illustrates a self-service data model according to an embodiment.
As shown in FIG. 12, according to an embodiment, a self-service data model or scenario enables customers to load external or custom data into custom data sets using ETL or other data streamlines or processes provided by the system, which provides dimensional consistency. The customer may define one or more "real-time" data sets populated by the system; and combines the "real-time" dataset with the external dataset to create a combined dataset that can be queried. In such a scenario, customer responsibility typically includes manually refreshing their data sets; and enhances the security of the data set.
FIG. 13 illustrates a curated data model according to an embodiment.
As shown in fig. 13, according to an embodiment, a curated data model or scenario provides a centralized or managed analytics environment in which ETLs or other data pipelines or processes provided by the system publish customer data to immutable analytics application schemas; and the customer loads (onboard) external or custom data into its customer model using a custom ETL or other data pipeline or process. Customers can create a business database view that combines the systems managed tables and custom tables and query the combined data using their selected tools. In such a scenario, customer responsibility typically includes managing the use of custom ETLs or other data streamlines or processes to load and refresh data into their customer mode.
The above examples of curated and self-service data models or scenarios are provided by way of example. According to various embodiments, the system may support other types of data models or scenarios.
Automatic generation of BI data models
As noted above, there is increasing interest in developing software applications that take advantage of the use of data analysis, for example, in the context of an organizational Enterprise Resource Planning (ERP) or other enterprise computing environment; however, conventional methods of preparing BI data models have been less successful in dealing with the complex patterns used in modern enterprise computing environments.
For example, different enterprise customers may have specific requirements on how their data should be classified, aggregated, or transformed in order to provide key performance indicators, data analytics, or other types of business intelligence data. For example, a customer may modify a data source model associated with their data, e.g., by adding custom facts or dimension selections.
According to various embodiments, to support different customer needs, the system may include a semantic layer that enables the use of custom semantic extensions to extend the semantic data model (semantic model) and provide custom content at the presentation layer. The semantic layer may include a physical layer mapped to a physical data model or data plane; a logical layer that operates as a mapping or change layer that can define computations; and a presentation layer that enables a user to access the data as content.
For example, according to embodiments, the semantic model extension process may introspecte customer data stored in, for example, a data warehouse instance, and evaluate metadata associated with the customer data to determine custom facts, custom dimensions, and/or other types of data source model extensions to extend or otherwise customize the semantic model according to the customer's requirements.
In some environments, customers may also use BI products or environments, such as, for example, oracle NetSuite, which typically provides ERP computing environments for large and medium-sized enterprises that support front-end and back-end processes, such as, for example, financial management, revenue management, fixed assets, order management, billing, and inventory management, which may have additional requirements and require further modification of the semantic model to enable, for example, customer NetSuite data to be used in their analytics environment.
In accordance with embodiments, described herein are systems and methods for automatically generating Business Intelligence (BI) data models using data introspection and curation as may be used, for example, with Enterprise Resource Planning (ERP) or other enterprise computing or data analysis environments. The described method uses a combination of manually planned workpieces and automatically generating models through data introspection of the source data environment to derive a target BI data model. For example, the pipeline generator framework may evaluate the dimensions, degradation properties, and application metrics of the transaction type; and uses the output of this process to create an output target model and pipeline or loading plan. The systems and methods described herein provide technical improvements in building new subject areas or BI data models in a much shorter period of time.
In general, according to embodiments, a system includes a pipeline or snapshot (ETL) generator, component, or process for automatically generating one or more mappings by referencing or viewing a source model; where, for example, the automatic generation process may include using manually planned workpieces, and automatically determined or interpreted variables.
A semantic model (RPD) generator, component or process generates a data model for a transaction type; wherein, for example, the RPD generation process may use the determined dimensions and facts to generate a semantic model, e.g., as a BI Repository (RPD) file.
The security artifact generator, component, or process overlays the generated semantic model with any required security artifacts (e.g., those artifacts described in the source model); where, for example, the secure artifact generation process can create security filters and application roles that control data visibility.
A Human Readable Format (HRF) generator, component or process may be used to generate human readable format data for subsequent use thereof, such as creating a BI report.
FIG. 14 illustrates a system for automatically generating a BI data model using data introspection and curation according to embodiments.
As shown in FIG. 14, according to embodiments, a customer may use a BI environment such as, for example, an Oracle NetSuite provided at a BI datacenter 310 and in this example including a NetSuite Oracle database 312, having a NetSuite (NS) customer schema 314, and a provisioning component 316 that enables, for example, netSuite data of the customer to be provided to the analytics application environment.
In an analytics application environment, according to an embodiment, the BI provisioning component 300 enables customer (e.g., netSuite or other BI or ERP environment) data to be received and loaded into a data warehouse instance from a customer's enterprise software application or data environment and associated with a customer's (e.g., NSAW) data schema 320, where the semantic model may then be used to render (surface) the packaged content from the customer's source data to a presentation layer.
According to embodiments, a semantic model may be defined, for example, in an Oracle environment as a BI Repository (RPD) file with various physical, business, and mapping layers and other constructs that define logical schema, physical-to-logical mapping, aggregate table navigation, and/or implement aspects of the semantic model.
According to embodiments, customers may modify their data source model or their NetSuite or other BI or ERP products or environments to support their particular needs, for example, by adding custom facts or dimensions associated with data stored in their data warehouse instances; and the system can extend the semantic model accordingly.
For example, according to an embodiment, the system may process programmatically introspecting customer data using semantic model extensions and determine custom facts, custom dimensions, or other customizations or extensions that have been made to the data source model, and then automatically modify or extend the semantic model using appropriate procedures to support those customizations or extensions.
FIG. 15 further illustrates a system for automatically generating a BI data model using data introspection and curation according to embodiments.
Some ERP or other enterprise computing or BI environments (such as, for example, netSuite) utilize a data model in which different modules (e.g., sales or purchase order modules) may store data using different transaction tables.
According to an embodiment, the NetSuite data model 340 may be used to map NetSuite data models including various business entities stored in a single set of transaction tables when analyzing an application environment for use with a BI feed component that enables NetSuite data to be received into the system.
Generally, the transaction table in the NetSuite environment is striped (striped) by a field called transactionType that stores and indicates which entity the record represents. The transaction table has a superset of all columns and attributes required for all transaction types, and only the columns associated with a transaction are stored in the corresponding record. For example, a purchase order transaction will have a supplier column filled, but the customer is empty; whereas sales orders are reversed.
Given the network of this transaction table, a list of applicable dimensions and attributes may be determined by introspecting the data in the various fields to determine, for example:
and (3) purchasing an order: this may include applicable dimensions such as, for example, supplier, time, project, sub-company.
Sales orders: this may include applicable dimensions such as, for example, customer, time, project, subsidiaries.
Given a list of applicable dimensions and attributes, data models, pipelines, and semantic models, e.g., star patterns, can be built for this purpose.
Initial planning: while various star patterns as well as pipelines and semantic models can be built by introspection, dimensions must be planted into the data models and pipelines by manual planning. The code generation process also needs to know the supported dimensions and a superset of dimensional attributes that can be introspected to be included or excluded in the model.
Safety: the generator creates a default security group and security filters and application roles for controlling the security filters for each subject area. The client can then assign a particular user membership to an enterprise role, and the security filter will be automatically activated in the semantic model, limiting visibility to the assigned set of rows for that user.
Pipeline generator framework
FIG. 16 illustrates an example pipeline generator framework for automatically generating a BI data model in accordance with embodiments.
As shown in FIG. 16, according to an embodiment, the pipeline generator framework may perform a process of creating an output target model and a pipeline or loading plan, including, for example:
a seed (e.g., ODI) repository 352 that provides planting dimensions associated with the data model and pipeline and provided by manual planning.
Pipeline and snapshot generator 356, which performs the processing described below to automatically generate one or more mappings by reference or viewing the source model.
An (e.g., netSuite UMD) API 354 that receives information from a customer (e.g., netSuite or other BI or ERP environment).
A generated (e.g., ODI) repository 364, which is created based on the dimensions of the planting.
A human-readable format (HRF) generator 366 adapted to generate human-readable format data for subsequent use thereof, such as creating a BI report or other HRF document 370.
One or more decision files 358.
An RPD generator 360 adapted to generate a data model for the transaction type; wherein, for example, the RPD generation process may use the determined dimensions and facts to generate a semantic model, e.g., as a BI Repository (RPD) file; based on the seed rpd (seed. Rpd) 362; and provides the generated RPD 368 as output.
A security generator 380 adapted to overlay the generated semantic model with any required security artifacts (e.g., those described in the source model); wherein, for example, the secure artifact generation process can create security filters and application roles that control data visibility; and preparing the protected RPD (secured. RPD) 382
According to embodiments, the pipeline generator framework may include a number of components or functions, such as:
1. pipeline generation
According to an embodiment, an assembly line or snapshot (ETL) generator, component, or process is used to automatically generate one or more mappings by referencing or viewing a source model.
For example, an automatic generation process may include using manually planned workpieces and automatically determined or interpreted variables. The seed repository includes manually curated artifacts, such as, for example, base dimensions associated with the environment, for use by the pipeline generator. Other transaction dimensions, columns, or security artifacts, etc. are then automatically generated by the framework.
2. Semantic model generation
According to an embodiment, a semantic model (RPD) generator, component, or process generates a data model for a transaction type. For example, the RPD generation process may use the determined dimensions and facts to generate a semantic model, e.g., as a BI Repository (RPD) file. It uses the output of the previous step and the template rpd xml file.
As above, the seed repository includes manually curated artifacts, such as, for example, fundamental dimensions associated with the environment, for use with the RPD generator.
Step 1: begin process degraded column-degraded column not needed in the process fact table.
And 2, step: begin processing unused fact columns-only the fact or measurement columns needed to process and retain the transaction type.
And step 3: begin processing dimension-only the dimension columns needed to process and retain transaction types.
And 4, step 4: physical layer modification-physical layer table is created in rpd.
And 5: starting to create a new theme zone object-LTS, date dimension, key, measurement definition; comprises the following steps of 5.1: LTS; and step 5.2: a logic table; step 5.3: a logical column; step 5.4: a logical key; step 5.5: measuring; step 5.6: a logically complex connection; step 5.7: dimension; step 5.8: a logical level;
step 6: start presentation Change-creation of a presentation layer object for a transaction type.
3. Secure generation
According to an embodiment, a security artifact generator, component, or process will overwrite the generated semantic model with any required security artifacts (e.g., those described in the source model). For example, the secure artifact generation process can create security filters and application roles that control data visibility.
According to an embodiment, a security artifact generator or process overlays the generated semantic model with any required security artifacts (e.g., those artifacts described in the source model). According to an embodiment, the above-described steps or processes result in the creation of a pipeline from a source data environment or system (such as, for example, netSuite ERP or other enterprise computing environment) to, for example, one or more BI reports. The pipeline may then be used to retrieve data from the source data environment, and a BI report may then be run against the retrieved data.
4. Readable format data generation
According to embodiments, a human-readable format (HRF) generator, component or process may be used to generate human-readable format data for subsequent use thereof, such as creating a BI report.
For example, in an Oracle Analytics for Applications (OAX), fusion Analytics Warehouse, or Oracle Cloud Integration (OCI) environment, the HRF generation process may generate an HRF format or mapping format, for example, used by the OAX team to manage the ODI repository. According to an embodiment, a human-readable format (HRF) generator or process may be used to generate human-readable format data for subsequent use thereof.
FIG. 17 illustrates an example flowchart of a process for use in automatic generation of a BI data model in accordance with embodiments.
As shown in FIG. 17, according to embodiment 400, upon determining an input transaction type (e.g., a sales order transaction type), the process can access corresponding, e.g., netSuite tables and introspection or view data in these tables to determine dimensions and attributes and generate a target model and loading plan.
For example, at step 402, an input transaction type (e.g., purchOrd) is received.
At step 404, the system connects to NetSuite or other BI or ERP environment and reverse-engineer the tables found therein and creates an alias for each transaction type, such as: transaction purchase order (Transaction _ PurchOrd); transaction line purchase (transactionine purchrord); and a transaction accounting line (transactioningline _ purchasord).
At step 406, the system creates a scratch list in the data warehouse for the above items, such as: transaction _ PurchOrd; transactionline _ PurchOrd; and TransactionAccountinglinePurchOrd.
At step 408, the system creates an ODI map to scratchpad data from each of these to a corresponding table, including, for example, automatically adding an incremental filter if the last modified data (lastmodifieddedate) column is found; assign the appropriate knowledge module in the ODI; and generating a scene (a compiled version of the mapping).
At step 410, the system saves the data in these three tables to determine the applicable dimensions and creates a reject dimensions.
At step 412, the system saves the data in these three tables to determine the degenerate property and creates a objectedattributes.
At step 414, the system saves the data in these three tables to determine the applicable measurements and creates a reject measures.
At step 416, the system creates a target fact table model, such as DW _ PURCHASEORDER _ F.
At step 418, the system creates an ODI map to load data from the scratch table to the fact table.
At step 420, the system updates the daily loading plan to include the ODI scenario (for runtime).
At step 422, if snapshotbuilt = true, the system generates a snapshot table with snapshot _ dt.
At step 424, the system creates an ODI map to load data from the fact table to the snapshot table.
At step 426, the system updates the snapshot loading plan to include the ODI scenario (for runtime).
FIG. 18 further illustrates an example flow diagram for a process used in the automatic generation of BI data models in accordance with an embodiment.
As shown in FIG. 18, in determining an input transaction type and having introspection data to determine the dimensions and attributes of the transaction type as described above, the process may access, for example, a template star schema and create an appropriate, for example, sales order star schema by referencing or viewing introspection data, according to an embodiment 440.
At step 442, an input transaction type (e.g., purchOrd) is received, e.g., reject dimensions.txt; reject attributes. Txt; txt, as described above.
At step 444, the system makes a copy of the subject area of the planting, e.g., (DW _ SUBJAREA _ F).
At step 446, the system replaces (in this example) all _ SUBJAREA _ strings with a transaction code, e.g., PURCHASEORDER.
At step 448, the system prunes all dimensions listed in reject dimensions.
At step 450, the system prunes all the attributes listed in rejectedattributes.
At step 452, the system prunes all the attributes listed in reject measures.
At step 454, the system creates an unprotected rpd, e.g., (NSFinal.
At step 456, the system creates an unprotected rpd (nsfinal.
At step 458, the system creates a visibility role for the subject area.
At step 460, the system creates a data security role for each protection dimension.
At step 462, the system creates a protected rpd (nsfinalseured.
According to an embodiment, as further illustrated in FIG. 18, a security artifact generator may then be used, for example, to create an appropriate visibility role or data security role for each protected dimension.
According to an embodiment, the described method uses a combination of manual model planning of the source data environment and automatic generation by data introspection to derive a target BI data model; and provides technical improvements for the construction of new subject areas or BI data models in a much shorter period of time. The various steps, components or processes described above may be provided as software or program code executable by a computer system or other type of processing device.
Example pipeline Generator inputs
According to various embodiments, an example of the inputs to the pipeline generator are shown and described below:
1. transaction type list
FIG. 19 illustrates an example list of transaction types according to an embodiment.
As shown in fig. 19, a list of transaction types 510 may be captured in a file, for example, as subjectarea. This file is used to control the transaction type, abbreviation, business friendly name, and security group to which the transaction type is attributed that is processed on the fly.
2. Transaction column List
FIG. 20 illustrates an example transaction column list in accordance with an embodiment.
As shown in FIG. 20, according to an embodiment, the transaction column list 520 may be provided as an input file that is a static file that captures a list of all columns in the transaction table, regardless of whether they are to be processed as facts or dimensions or measurements. For example, if the BI/ERP system adds or deletes a new set of columns, the file may be updated at each publication.
3. Dimension to logical dimension mapping
FIG. 21 illustrates an example dimension to logical dimension mapping in accordance with embodiments.
As shown in FIG. 21, according to an embodiment, dimension to logical dimension mapping 530 may be provided as a file that provides logical names for all dimensions used in the model.
4. Physical to logical attribute mapping
FIG. 22 illustrates an example physical-to-logical attribute mapping, according to an embodiment.
As shown in FIG. 22, according to an embodiment, the physical-to-logical attribute mapping 540 may be provided as a file that provides the logical name of the physical attribute in the transaction table. Logical names are used in the semantic model.
5. Physical to logical measurement mapping
Fig. 23 illustrates an example physical-to-logical measurement mapping, according to an embodiment.
As shown in FIG. 23, according to an embodiment, physical-to-logical measurement mapping 550 may be provided as a file that provides logical names for physical measurements in the transaction table. Logical names are used in the semantic model.
6. Template semantic model
According to an embodiment, a template semantic model has a definition of all the dimensions of the plan and a sample fact table that is used as a model to create a semantic model for a particular transaction type.
7. Template ODI repository
According to an embodiment, the template ODI repository has the definition of all the dimensions planned and is used to create the ODI repository model for a specific transaction type.
Example user interface for automatically generating BI data models
24-31 illustrate examples of user interfaces associated with automatic generation of BI data models, such as:
FIG. 24 illustrates an example user interface for use with a system for automatically generating a BI data model in accordance with embodiments.
FIG. 25 illustrates an example user interface for use with a system for automatically generating a BI data model, including a display of a package, according to embodiments.
FIG. 26 illustrates an example user interface for use with a system for automatically generating a BI data model, including a display of a loading plan, according to an embodiment.
FIG. 27 illustrates an example user interface for use with a system for automatically generating a BI data model including a display of a generated rpd file in accordance with embodiments.
FIG. 28 illustrates an example user interface for use with a system for automatically generating BI data models, including an example of a business model associated with an rpd file, in accordance with embodiments.
FIG. 29 illustrates an example user interface for use with a system for automatically generating a BI data model, including an example of a security filter, in accordance with an embodiment.
FIG. 30 illustrates an example user interface for use with a system for automatically generating a BI data model, including an example of a display model, according to an embodiment.
FIG. 31 illustrates an example user interface for use with a system for automatically generating a BI data model, including an example of a mapping, according to embodiments.
Automatic generation of BI data model processes
FIG. 32 illustrates a flowchart of a method of providing automatic generation of BI data models using data introspection and curation, according to an embodiment.
As shown in fig. 32, according to an embodiment, at step 602, the application environment is analyzed to a data warehouse to store data.
At step 604, a pipeline or snapshot (ETL) generator, component or process is used to automatically generate one or more mappings by referencing or viewing the source model; where, for example, the automatic generation process may include using manually planned workpieces, and automatically determined or interpreted variables.
At step 606, a semantic model (RPD) generator, component or process generates a data model of the transaction type; wherein, for example, the RPD generation process may use the determined dimensions and facts to generate a semantic model, e.g., as a BI Repository (RPD) file.
At step 608, the security artifact generator, component, or process overlays the generated semantic model with any required security artifacts (e.g., those artifacts described in the source model); where, for example, the secure artifact generation process can create security filters and application roles that control data visibility.
At step 610, a human-readable format (HRF) generator, component or process may be used to generate human-readable format data for subsequent use thereof.
According to various embodiments, the teachings herein may be conveniently implemented using one or more conventional general purpose or special purpose computers, computing devices, machines or microprocessors (including one or more processors, memories, and/or computer-readable storage media programmed according to the teachings of the present disclosure). Appropriate software coding can readily be prepared by skilled programmers based on the teachings of the present disclosure, as will be apparent to those skilled in the software art.
In some embodiments, the teachings herein may include a computer program product which is a non-transitory computer-readable storage medium(s) having instructions stored thereon/therein, which may be used to program a computer to perform any of the processes of the present teachings. Examples of such storage media may include, but are not limited to, hard disk drives, hard drives, fixed disks or other electromechanical data storage devices, floppy disks, optical disks, DVD, CD-ROMs, microdrive, and magneto-optical disks, ROMs, RAMs, EPROMs, EEPROMs, DRAMs, VRAMs, flash memory devices, magnetic or optical cards, nanosystems, or other types of storage media or devices suitable for the non-transitory storage of instructions and/or data.
The foregoing description is provided for the purposes of illustration and description. It is not intended to be exhaustive or to limit the scope to the precise form disclosed. Many modifications and variations will be apparent to practitioners skilled in the art.
For example, while various embodiments of the systems and methods described herein illustrate use in various Enterprise Resource Planning (ERP) or other enterprise computing or data analysis environments (such as, for example, netSuite or Fusion applications); various embodiments may be used with other types of ERP, cloud computing, enterprise computing, or other computing environments.
The foregoing description has been presented for purposes of illustration and description. It is not intended to be exhaustive or to limit the scope to the precise form disclosed. Many modifications and variations will be apparent to practitioners skilled in the art.

Claims (15)

1. A system for automatically generating a data model using data introspection and curation, comprising:
a computer comprising one or more processors, the computer providing access to a data repository by an analytics application environment for a plurality of tenants to store data;
wherein the system provides a generator framework operable to:
automatically generating one or more data maps associated with the source data environment by reference to a source model associated with the source data environment, wherein the automatically generating process includes using the curated artifact and automatically determined or interpreted variables;
generating a data model for a transaction type associated with a source data environment, including determining dimensions and facts associated with source data to generate a semantic model;
overlaying the generated semantic model with a secure artifact that controls data visibility; and
readable format data is generated for use in creating the report.
2. The system as claimed in claim 1 wherein the system performs extraction, transformation, load data pipelining or processing according to analytics application schema and/or customer schema associated with the tenant to receive data from the tenant's enterprise software application or data environment for loading into a data warehouse instance.
3. The system of any of claims 1 or 2, wherein the generation of the one or more extract, transform, load (ETL) maps comprises receiving a curated artifact from a seed repository, including base dimensions associated with a source data environment; and
wherein additional transaction dimensions, columns, or security artifacts are automatically generated by the generator framework.
4. The system of any of claims 1 to 3, wherein the generated semantic model is stored as a Business Intelligence (BI) Repository (RPD) file.
5. The system of any of claims 1 to 4, wherein the source data environment is one of NetSuite, business Intelligence (BI), enterprise Resource Planning (ERP), cloud computing, enterprise computing, or other computing environment.
6. A method for automatically generating a data model using data introspection and curation, comprising:
providing, by a computer comprising one or more processors, access to a data repository by an analytics application environment for a plurality of tenants to store data;
automatically generating one or more data maps associated with the source data environment by reference to a source model associated with the source data environment, wherein the automatically generating process includes using the curated artifact and automatically determined or interpreted variables;
generating a data model for a transaction type associated with a source data environment, including determining dimensions and facts associated with source data to generate a semantic model;
overlaying the generated semantic model with a secure artifact that controls data visibility; and
readable format data is generated for use in creating the report.
7. The method of claim 6, further comprising performing an extract, transform, load data pipeline, or process to receive data from the tenant's enterprise software application or data environment for loading into the data warehouse instance according to an analytics application schema and/or a customer schema associated with the tenant.
8. The method of any of claims 6 or 7, wherein the generating of the one or more extract, transform, load (ETL) maps comprises receiving a curated artifact from a seed repository, including base dimensions associated with a source data environment; and
wherein additional transaction dimensions, columns, or security artifacts are automatically generated by the generator framework.
9. The method of any of claims 6 to 8, wherein the generated semantic model is stored as a Business Intelligence (BI) Repository (RPD) file.
10. The method of any of claims 6 to 9, wherein the source data environment is one of NetSuite, business Intelligence (BI), enterprise Resource Planning (ERP), cloud computing, enterprise computing, or other computing environment.
11. A non-transitory computer-readable storage medium comprising instructions stored thereon, which when read and executed by one or more computers, cause the one or more computers to perform a method comprising:
providing access to a data repository by an analytics application environment for a plurality of tenants to store data;
automatically generating one or more data maps associated with the source data environment by reference to a source model associated with the source data environment, wherein the automatically generating process includes using the curated artifact and automatically determined or interpreted variables;
generating a data model for a transaction type associated with a source data environment, including determining dimensions and facts associated with source data to generate a semantic model;
overlaying the generated semantic model with a secure artifact that controls data visibility; and
readable format data is generated for use in creating the report.
12. The non-transitory computer readable storage medium of claim 11, further comprising performing an extract, transform, load data pipeline, or process according to an analytics application schema and/or customer schema associated with the tenant to pipeline from the tenant's enterprise software application or data environment for loading into a data warehouse instance.
13. The non-transitory computer readable storage medium of any of claims 11 or 12, wherein the generation of the one or more extract, transform, load (ETL) maps comprises receiving a curated artifact from a seed repository, including base dimensions associated with a source data environment; and
wherein additional transaction dimensions, columns, or security artifacts are automatically generated by the generator framework.
14. The non-transitory computer readable storage medium of any of claims 11 to 13, wherein the generated semantic model is stored as a Business Intelligence (BI) Repository (RPD) file.
15. The non-transitory computer readable storage medium of any of claims 11 to 14, wherein the source data environment is one of NetSuite, business Intelligence (BI), enterprise Resource Planning (ERP), cloud computing, enterprise computing, or other computing environment.
CN202180015587.1A 2020-02-20 2021-02-19 System and method for automatically generating BI models using data introspection and curation Pending CN115210701A (en)

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
US202062979269P 2020-02-20 2020-02-20
US62/979,269 2020-02-20
US16/868,081 2020-05-06
US16/868,081 US20200356575A1 (en) 2019-05-06 2020-05-06 System and method for customization in an analytic applications environment
PCT/US2021/018885 WO2021168331A1 (en) 2020-02-20 2021-02-19 System and method for automatic generation of bi models using data introspection and curation

Publications (1)

Publication Number Publication Date
CN115210701A true CN115210701A (en) 2022-10-18

Family

ID=77391760

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202180015587.1A Pending CN115210701A (en) 2020-02-20 2021-02-19 System and method for automatically generating BI models using data introspection and curation

Country Status (4)

Country Link
EP (1) EP4107633A1 (en)
JP (1) JP2023515082A (en)
CN (1) CN115210701A (en)
WO (1) WO2021168331A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023097521A1 (en) * 2021-11-30 2023-06-08 西门子股份公司 Data model generation method and apparatus

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9442993B2 (en) * 2013-02-11 2016-09-13 Dell Products L.P. Metadata manager for analytics system

Also Published As

Publication number Publication date
WO2021168331A1 (en) 2021-08-26
EP4107633A1 (en) 2022-12-28
JP2023515082A (en) 2023-04-12

Similar Documents

Publication Publication Date Title
US20200356575A1 (en) System and method for customization in an analytic applications environment
EP3107050A1 (en) System for data aggregation and report generation
US11640406B2 (en) System and method for data analytics with an analytic applications environment
US11379496B2 (en) System and method for universal format driven data transformation and key flex fields in a analytic applications environment
US20210049183A1 (en) System and method for ranking of database tables for use with extract, transform, load processes
US11573974B2 (en) System and method for automatic correction/rejection in an analysis applications environment
US11436259B2 (en) System and method for SaaS/PaaS resource usage and allocation in an analytic applications environment
US20200334089A1 (en) System and method for determining an amount of virtual machines for use with extract, transform, load (etl) processes
US11615108B2 (en) System and method for running warehouse loads for multiple tenants of a data warehouse
US20220129798A1 (en) System and method for extending and validating a semantic model for use with an analytic applications environment
US20210004382A1 (en) System and method for data pipeline optimization in an analytic applications environment
US11614976B2 (en) System and method for determining an amount of virtual machines for use with extract, transform, load (ETL) processes
CN115210701A (en) System and method for automatically generating BI models using data introspection and curation
US11609904B2 (en) System and method for extensibility in an analytic applications environment
US20210173846A1 (en) System and method for automatic generation of bi models using data introspection and curation
US11966870B2 (en) System and method for determination of recommendations and alerts in an analytics environment
US20230297586A1 (en) System and method for generating a network graph from analytic artifacts in an analytics environment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination