CN117076521A

CN117076521A - Operational data analysis methods and systems based on big data

Info

Publication number: CN117076521A
Application number: CN202311087003.9A
Authority: CN
Inventors: 李臣; 鲍则民
Original assignee: Ningbo Zhiliang Technology Co ltd
Current assignee: Ningbo Zhiliang Technology Co ltd
Priority date: 2023-08-28
Filing date: 2023-08-28
Publication date: 2023-11-17

Abstract

The invention relates to the technical field of data analysis, and specifically relates to an operational data analysis method and system based on big data, which can receive analysis targets input by users, automatically collect and store original data, perform data repair or data cleaning, and obtain secondary data. Integrate it into the preset standardized data model and perform standardization processing to obtain standardized data; use preset big data technology and machine learning algorithms to conduct data analysis and mining of standardized data to obtain data hidden associated information; according to the analysis goals, design And calculate key indicators, establish mathematical models and algorithms related to key indicators; use mathematical models and algorithms to conduct data analysis and obtain analysis results. The technical solution shown in the present invention uses machine learning technology to discover more associations and patterns from data and establish relevant models, thereby obtaining more accurate and intelligent analysis results.

Description

Operational data analysis methods and systems based on big data

技术领域Technical field

本发明涉及数据分析技术领域，具体涉及一种基于大数据的运营数据分析方法及系统。The present invention relates to the technical field of data analysis, and specifically relates to an operational data analysis method and system based on big data.

背景技术Background technique

运营数据分析是指对运营数据进行深入的、全面的分析和洞察，以揭示数据背后的模式、趋势和关联，帮助企业做出更准确的决策和优化运营。Operational data analysis refers to in-depth and comprehensive analysis and insights into operational data to reveal the patterns, trends and correlations behind the data, helping enterprises to make more accurate decisions and optimize operations.

现有运营数据分析里包括数据仓库与商务智能（Data Warehouse&BusinessIntelligence，简称DW/BI）、数据挖掘（Data Mining）和统计分析等。数据仓库与商务智能：DW/BI技术主要涉及数据抽取（Extraction）、转换（Transformation）、加载（Loading）的ETL流程，将企业各个数据源的数据整合到中心化的数据仓库中。然后，通过多维数据模型（如星型模型、雪花模型等）进行数据分析，并利用OLAP（在线分析处理）技术，快速生成各种报表和查询结果。数据挖掘：主要基于统计学和机器学习算法，通过模式识别、分类、聚类、关联规则挖掘等方法，从大规模数据中发现隐藏的模式和关联信息；常用的算法包括决策树、神经网络、支持向量机、聚类分析等。统计分析：基于统计学理论和方法，通过参数估计、假设检验、方差分析、回归分析等，从样本数据中推断总体的特征、关系或差异。Existing operational data analysis includes data warehouse & business intelligence (DW/BI), data mining and statistical analysis, etc. Data warehouse and business intelligence: DW/BI technology mainly involves the ETL process of data extraction (Extraction), transformation (Transformation), and loading (Loading), integrating data from various data sources of the enterprise into a centralized data warehouse. Then, perform data analysis through multi-dimensional data models (such as star schema, snowflake model, etc.), and use OLAP (online analytical processing) technology to quickly generate various reports and query results. Data mining: Mainly based on statistics and machine learning algorithms, it uses methods such as pattern recognition, classification, clustering, and association rule mining to discover hidden patterns and associated information from large-scale data; commonly used algorithms include decision trees, neural networks, Support vector machines, cluster analysis, etc. Statistical analysis: Based on statistical theories and methods, infer overall characteristics, relationships or differences from sample data through parameter estimation, hypothesis testing, variance analysis, regression analysis, etc.

然而现有运营数据分析需要面临以下问题：However, existing operational data analysis needs to face the following problems:

数据体量和速度问题：传统的技术难以处理大规模数据和快速变化的实时数据。数据量过大时，传统的数据仓库和ETL技术无法高效地处理和存储，导致性能下降。同时，传统的统计分析和数据挖掘方法需要预先定义模型和算法，无法快速适应新的数据和情境。数据质量问题：传统的技术往往无法处理数据质量问题，如数据丢失、重复、不一致等，因此数据分析结果可能不准确或有偏差。而且，数据仓库与商务智能技术对数据的结构和格式有较强的约束，无法处理非结构化和半结构化数据。决策局限性问题：传统的技术主要依赖人工经验和设定的规则逻辑，无法充分发挥数据的潜力。传统的数据挖掘和统计分析方法往往过于依赖特征工程和精细调整，从而忽略了数据内在的信息和关联性。Data volume and speed issues: Traditional technologies are difficult to handle large-scale data and rapidly changing real-time data. When the amount of data is too large, traditional data warehouse and ETL technology cannot efficiently process and store it, resulting in performance degradation. At the same time, traditional statistical analysis and data mining methods require pre-defined models and algorithms and cannot quickly adapt to new data and scenarios. Data quality issues: Traditional technologies are often unable to handle data quality issues, such as data loss, duplication, inconsistency, etc., so data analysis results may be inaccurate or biased. Moreover, data warehouse and business intelligence technology have strong constraints on the structure and format of data and cannot handle unstructured and semi-structured data. Decision-making limitations: Traditional technology mainly relies on manual experience and set rule logic, and cannot fully utilize the potential of data. Traditional data mining and statistical analysis methods often rely too much on feature engineering and fine tuning, thereby ignoring the inherent information and correlation of the data.

因此，目前的运营数据分析方式，存在一定的局限性，且主要依赖人工经验和设定的规则逻辑，无法充分发挥数据的潜力。Therefore, the current operational data analysis methods have certain limitations and mainly rely on manual experience and set rule logic, which cannot fully utilize the potential of the data.

发明内容Contents of the invention

有鉴于此，本发明的目的在于提供一种基于大数据的运营数据分析方法及系统，以解决现有技术中的运营数据分析方式，存在一定的局限性，且主要依赖人工经验和设定的规则逻辑，无法充分发挥数据的潜力的问题。In view of this, the purpose of the present invention is to provide an operational data analysis method and system based on big data to solve the existing operational data analysis methods, which have certain limitations and mainly rely on manual experience and set settings. The problem of rule logic and inability to fully utilize the potential of data.

根据本发明实施例的第一方面，提供一种基于大数据的运营数据分析方法，包括：According to a first aspect of the embodiment of the present invention, an operational data analysis method based on big data is provided, including:

接收用户输入的分析目标，自动采集与所述分析目标相关的原始数据并存储；Receive the analysis target input by the user, automatically collect and store the original data related to the analysis target;

对所述原始数据进行数据修复或数据清洗，得到二级数据；Perform data repair or data cleaning on the original data to obtain secondary data;

将所述二级数据整合到预设标准化数据模型中，并对所述二级数据进行标准化处理，得到标准化数据；Integrate the secondary data into a preset standardized data model, and perform standardization processing on the secondary data to obtain standardized data;

利用预设大数据技术和机器学习算法对所述标准化数据进行数据分析和挖掘，得到数据隐藏关联信息；Use preset big data technology and machine learning algorithms to perform data analysis and mining on the standardized data to obtain data hidden associated information;

根据所述分析目标，利用所述标准化数据和所述数据隐藏关联信息，设计并计算出关键指标，建立与所述关键指标相关的数学模型和算法；According to the analysis objectives, use the standardized data and the data hidden associated information to design and calculate key indicators, and establish mathematical models and algorithms related to the key indicators;

利用所述数学模型和算法，根据所述分析目标对所述标准化数据和所述数据隐藏关联信息进行数据分析，得到分析结果。Using the mathematical model and algorithm, data analysis is performed on the standardized data and the data hidden associated information according to the analysis target, and analysis results are obtained.

优选的，在所述得到分析结果之后，还包括：Preferably, after obtaining the analysis results, it also includes:

利用数据可视化技术，根据所述分析结果生成数据可视化界面。Utilize data visualization technology to generate a data visualization interface based on the analysis results.

优选的，所述自动采集与所述分析目标相关的原始数据，包括：Preferably, the automatic collection of raw data related to the analysis target includes:

在不同的数据源上部署数据采集代理，利用数据采集接口采集与所述分析目标相关的原始数据。Deploy data collection agents on different data sources, and use data collection interfaces to collect original data related to the analysis target.

优选的，所述自动采集与所述分析目标相关的原始数据并存储，包括：Preferably, the automatic collection and storage of raw data related to the analysis target includes:

利用云计算平台提供的存储服务存储采集到的所述分析目标相关的原始数据。The storage service provided by the cloud computing platform is used to store the collected original data related to the analysis target.

优选的，所述利用预设大数据技术和机器学习算法对所述标准化数据进行数据分析和挖掘，包括：Preferably, the use of preset big data technology and machine learning algorithms to perform data analysis and mining on the standardized data includes:

利用所述机器学习算法对所述标准化数据和所述数据隐藏关联信息进行特征提取，将所述标准化数据和所述数据隐藏关联信息转换为特征向量；Using the machine learning algorithm to perform feature extraction on the standardized data and the data hidden associated information, and converting the standardized data and the data hidden associated information into feature vectors;

对所述特征向量进行模式识别、分类、聚类和关联规则挖掘，得到数据隐藏关联信息。Pattern recognition, classification, clustering and association rule mining are performed on the feature vectors to obtain data hidden association information.

优选的，建立的数学模型和算法，包括：Preferably, the established mathematical models and algorithms include:

线性回归模型、聚类分析模型、文本情感分析模型、随机森林算法或关联规则挖掘算法中的一个或多个。One or more of a linear regression model, a cluster analysis model, a text sentiment analysis model, a random forest algorithm, or an association rule mining algorithm.

根据本发明实施例的第二方面，提供一种基于大数据的运营数据分析系统，包括：According to a second aspect of the embodiment of the present invention, an operational data analysis system based on big data is provided, including:

数据分析模块，用于接收用户输入的分析目标；Data analysis module, used to receive analysis goals input by users;

数据采集模块，用于自动采集与所述分析目标相关的原始数据并存储；A data collection module, used to automatically collect and store raw data related to the analysis target;

数据预处理模块，用于对所述原始数据进行数据修复或数据清洗，得到二级数据；A data preprocessing module is used to perform data repair or data cleaning on the original data to obtain secondary data;

数据集成标准化模块，用于将所述二级数据整合到预设标准化数据模型中，并对所述二级数据进行标准化处理，得到标准化数据；A data integration standardization module is used to integrate the secondary data into a preset standardized data model, and perform standardization processing on the secondary data to obtain standardized data;

数据分析挖掘模块，用于利用预设大数据技术和机器学习算法对所述标准化数据进行数据分析和挖掘，得到数据隐藏关联信息；The data analysis and mining module is used to perform data analysis and mining on the standardized data using preset big data technology and machine learning algorithms to obtain data hidden associated information;

指标计算模型建立模块，用于根据所述分析目标，利用所述标准化数据和所述数据隐藏关联信息，设计并计算出关键指标，建立与所述关键指标相关的数学模型和算法；An indicator calculation model establishment module is used to design and calculate key indicators based on the analysis target, using the standardized data and the data hidden associated information, and establish mathematical models and algorithms related to the key indicators;

所述数据分析模块，还用于利用所述数学模型和算法，根据所述分析目标对所述标准化数据和所述数据隐藏关联信息进行数据分析，得到分析结果。The data analysis module is also configured to use the mathematical model and algorithm to perform data analysis on the standardized data and the data hidden associated information according to the analysis target, and obtain analysis results.

优选的，所述的系统，还包括：Preferably, the system further includes:

云存储模块，用于利用云计算平台提供的存储服务存储采集到的所述分析目标相关的原始数据。A cloud storage module is used to store the collected original data related to the analysis target using the storage service provided by the cloud computing platform.

优选的，所述数据分析模块、所述数据采集模块、所述云存储模块、所述数据预处理模块、所述数据集成标准化模块、所述数据分析挖掘模块和所述指标计算模型建立模块之间通过云计算平台提供的服务和通信机制进行连接和数据传输。Preferably, one of the data analysis module, the data acquisition module, the cloud storage module, the data preprocessing module, the data integration standardization module, the data analysis mining module and the indicator calculation model establishment module. Connections and data transmission are carried out through the services and communication mechanisms provided by the cloud computing platform.

本发明的实施例提供的技术方案可以包括以下有益效果：The technical solutions provided by the embodiments of the present invention may include the following beneficial effects:

可以理解的是，本发明提供的技术方案，能够接收用户输入的分析目标，自动采集原始数据并存储，进行数据修复或数据清洗，得到二级数据，将其整合到预设标准化数据模型中，并进行标准化处理，得到标准化数据；利用预设大数据技术和机器学习算法对标准化数据进行数据分析和挖掘，得到数据隐藏关联信息；根据分析目标，设计并计算出关键指标，建立与关键指标相关的数学模型和算法；利用数学模型和算法，进行数据分析，得到分析结果。本发明示出的技术方案，能够自动清洗和修复数据中的错误、丢失和不一致性，从而提高数据质量。同时，采用大数据技术和机器学习算法，能够发现隐藏的模式和关联，提高分析的精度和准确性。不依赖于预设的规则和逻辑，能够灵活地处理多样化的分析需求和决策场景，通过机器学习技术，从数据中发现更多的关联和模式，并建立相关模型，从而得到更准确、智能的分析结果。It can be understood that the technical solution provided by the present invention can receive the analysis target input by the user, automatically collect and store the original data, perform data repair or data cleaning, obtain secondary data, and integrate it into the preset standardized data model. And perform standardization processing to obtain standardized data; use preset big data technology and machine learning algorithms to conduct data analysis and mining of standardized data to obtain data hidden related information; design and calculate key indicators based on the analysis goals, and establish relationships with key indicators mathematical models and algorithms; use mathematical models and algorithms to conduct data analysis and obtain analysis results. The technical solution shown in the present invention can automatically clean and repair errors, losses and inconsistencies in data, thereby improving data quality. At the same time, the use of big data technology and machine learning algorithms can discover hidden patterns and associations, improving the precision and accuracy of analysis. It does not rely on preset rules and logic, and can flexibly handle diverse analysis needs and decision-making scenarios. Through machine learning technology, it can discover more associations and patterns from the data and establish relevant models to obtain more accurate and intelligent results. analysis results.

相比传统技术具有数据质量和分析精度的提高、高效的数据处理和分析、简化操作和提升用户体验、灵活的决策支持等优点，这些优点使得该技术方案能够更好地支持企业的数据驱动决策，提升决策效果。Compared with traditional technology, it has the advantages of improved data quality and analysis accuracy, efficient data processing and analysis, simplified operations and improved user experience, and flexible decision support. These advantages make this technical solution better able to support enterprises' data-driven decisions. , improve the decision-making effect.

应当理解的是，以上的一般描述和后文的细节描述仅是示例性和解释性的，并不能限制本发明。It should be understood that the above general description and the following detailed description are exemplary and explanatory only, and do not limit the present invention.

附图说明Description of the drawings

此处的附图被并入说明书中并构成本说明书的一部分，示出了符合本发明的实施例，并与说明书一起用于解释本发明的原理。The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description serve to explain the principles of the invention.

图1是根据一示例性实施例示出的一种基于大数据的运营数据分析方法的步骤示意图；Figure 1 is a schematic diagram of the steps of an operational data analysis method based on big data according to an exemplary embodiment;

图2是根据一示例性实施例示出的数智化运营数据分析指标模型构建示意图。Figure 2 is a schematic diagram of building a digital intelligent operation data analysis index model according to an exemplary embodiment.

具体实施方式Detailed ways

这里将详细地对示例性实施例进行说明，其示例表示在附图中。下面的描述涉及附图时，除非另有表示，不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本发明相一致的所有实施方式。相反，它们仅是与如所附权利要求书中所详述的、本发明的一些方面相一致的装置和方法的例子。Exemplary embodiments will be described in detail herein, examples of which are illustrated in the accompanying drawings. When the following description refers to the drawings, the same numbers in different drawings refer to the same or similar elements unless otherwise indicated. The implementations described in the following exemplary embodiments do not represent all implementations consistent with the invention. Rather, they are merely examples of apparatus and methods consistent with aspects of the invention as detailed in the appended claims.

实施例一Embodiment 1

图1是根据一示例性实施例示出的一种基于大数据的运营数据分析方法的步骤示意图，参见图1，提供一种基于大数据的运营数据分析方法，包括：Figure 1 is a schematic diagram of the steps of a big data-based operational data analysis method according to an exemplary embodiment. Referring to Figure 1, a big data-based operational data analysis method is provided, including:

步骤S11、接收用户输入的分析目标，自动采集与所述分析目标相关的原始数据并存储；Step S11: Receive the analysis target input by the user, automatically collect and store the original data related to the analysis target;

需要说明的是，所述自动采集与所述分析目标相关的原始数据，包括：It should be noted that the automatic collection of raw data related to the analysis target includes:

在具体实践中，可以从各种数据源（如移动设备终端、应用、社交媒体等）采集原始数据，可以在不同的数据源上部署数据采集代理，使用设备接口、应用程序接口等，用于采集原始数据。In specific practice, raw data can be collected from various data sources (such as mobile device terminals, applications, social media, etc.), data collection agents can be deployed on different data sources, and device interfaces, application program interfaces, etc. can be used for Collect raw data.

需要说明的是，所述自动采集与所述分析目标相关的原始数据并存储，包括：It should be noted that the automatic collection and storage of raw data related to the analysis target includes:

在具体实践中，可以利用云计算平台提供的存储服务（如对象存储、数据库等）存储数据。可以搭建云存储设备：利用服务器、存储设备和网络设备等硬件资源，部署在可靠的数据中心或机房中。同时需要规划网络架构和带宽，以保证数据的快速传输和访问。配置分布式存储系统。制定数据管理策略，包括数据分类、数据归档和数据备份等。确保数据的安全性和完整性，备份数据以防止数据丢失。建立监控和管理系统，用于实时跟踪存储系统的性能、可用性等指标。In specific practice, the storage services (such as object storage, database, etc.) provided by the cloud computing platform can be used to store data. Cloud storage devices can be built: using hardware resources such as servers, storage devices, and network devices, deployed in reliable data centers or computer rooms. At the same time, network architecture and bandwidth need to be planned to ensure fast transmission and access of data. Configure a distributed storage system. Develop data management strategies, including data classification, data archiving and data backup. Ensure data security and integrity, back up data to prevent data loss. Establish a monitoring and management system to track the performance, availability and other indicators of the storage system in real time.

步骤S12、对所述原始数据进行数据修复或数据清洗，得到二级数据；Step S12: Perform data repair or data cleaning on the original data to obtain secondary data;

在具体实践中，对数据进行数据修复或数据清洗，能够提升数据质量，同时确保数据一致性。数据清洗包括去除重复数据、填补缺失值、处理异常值和转换数据格式等操作，以提高数据的准确性和可靠性，可以利用数据清洗技术PySpark、Pandas等对采集到的数据进行清洗，数据清洗中的转换数据格式，可以根据分析目标将不同数据源和数据类型的数据进行转换，并对数据进行标准化和格式化处理。In specific practice, data repair or data cleaning can improve data quality while ensuring data consistency. Data cleaning includes operations such as removing duplicate data, filling in missing values, processing outliers, and converting data formats to improve the accuracy and reliability of data. Data cleaning technologies such as PySpark and Pandas can be used to clean the collected data. Data cleaning The conversion data format in can convert data from different data sources and data types according to the analysis goals, and standardize and format the data.

步骤S13、将所述二级数据整合到预设标准化数据模型中，并对所述二级数据进行标准化处理，得到标准化数据；Step S13: Integrate the secondary data into the preset standardized data model, and perform standardization processing on the secondary data to obtain standardized data;

在具体实践中，将二级数据进行标准化（或同构化）处理后，能够便于后续的数据分析和处理。可以使用ETL（数据仓库技术）工具Apache Airflow、Talend等将不同数据源的数据整合到一个中心化的数据仓库中，并进行数据标准化处理，以保证数据的一致性和可比较性。In specific practice, standardizing (or isomorphizing) secondary data can facilitate subsequent data analysis and processing. You can use ETL (data warehouse technology) tools such as Apache Airflow and Talend to integrate data from different data sources into a centralized data warehouse and perform data standardization to ensure data consistency and comparability.

步骤S14、利用预设大数据技术和机器学习算法对所述标准化数据进行数据分析和挖掘，得到数据隐藏关联信息；Step S14: Use preset big data technology and machine learning algorithms to perform data analysis and mining on the standardized data to obtain data hidden associated information;

需要说明的是，能够利用所述机器学习算法对所述标准化数据和所述数据隐藏关联信息进行特征提取，将所述标准化数据和所述数据隐藏关联信息转换为特征向量；对所述特征向量进行模式识别、分类、聚类和关联规则挖掘，得到数据隐藏关联信息。所述数据隐藏关联信息至少包括隐藏的模式、关联信息和异常信息。优选的，可以根据具体分析目标，选择适当的机器学习算法和模型决策树、神经网络等对数据的特征向量进行训练和优化，以建立准确和鲁棒的模型，从而进行数据分析和挖掘。It should be noted that the machine learning algorithm can be used to perform feature extraction on the standardized data and the data hidden associated information, and convert the standardized data and the data hidden associated information into feature vectors; Carry out pattern recognition, classification, clustering and association rule mining to obtain data hidden association information. The data hiding associated information at least includes hidden patterns, associated information and anomaly information. Preferably, according to the specific analysis goals, appropriate machine learning algorithms and model decision trees, neural networks, etc. can be selected to train and optimize the feature vectors of the data to establish accurate and robust models for data analysis and mining.

步骤S15、根据所述分析目标，利用所述标准化数据和所述数据隐藏关联信息，设计并计算出关键指标，建立与所述关键指标相关的数学模型和算法；Step S15: According to the analysis target, use the standardized data and the data hidden associated information to design and calculate key indicators, and establish mathematical models and algorithms related to the key indicators;

在具体实践中，关键指标可以为KPI（关键绩效指标）、KRIs（关键风险指标）等，可以使用SQL语言或编程语言实现指标计算逻辑。In specific practice, key indicators can be KPIs (key performance indicators), KRIs (key risk indicators), etc., and SQL language or programming language can be used to implement indicator calculation logic.

需要说明的是，建立的数学模型和算法，包括：It should be noted that the mathematical models and algorithms established include:

所述线性回归模型用于分析自变量与因变量之间的线性关系，并进行预测和推断。例如，在电子商务领域，可以使用线性回归模型来分析广告投入与销售额之间的关系，并根据模型预测适当的广告投入。The linear regression model is used to analyze the linear relationship between independent variables and dependent variables, and perform prediction and inference. For example, in the field of e-commerce, a linear regression model can be used to analyze the relationship between advertising investment and sales, and predict appropriate advertising investment based on the model.

随机森林算法是一个集成学习算法，通过构建多个决策树模型并合并最终预测结果，具有良好的预测准确性和鲁棒性。例如，在金融风控领域，可以使用随机森林算法来评估借贷申请的信用风险。The random forest algorithm is an ensemble learning algorithm that has good prediction accuracy and robustness by building multiple decision tree models and merging the final prediction results. For example, in the field of financial risk control, the random forest algorithm can be used to assess the credit risk of loan applications.

聚类分析模型用于将数据样本分组成相似的簇，根据簇间的差异性进行数据分类和分析。例如，在市场营销中，可以使用聚类分析模型将顾客分为不同的群体，并针对每个群体制定相应的营销策略。The cluster analysis model is used to group data samples into similar clusters, and perform data classification and analysis based on the differences between clusters. For example, in marketing, cluster analysis models can be used to divide customers into different groups and develop corresponding marketing strategies for each group.

关联规则挖掘算法用于发现数据集中的相关模式和关联规则，帮助解释数据之间的关系。例如，在零售业中，可以使用关联规则挖掘算法来发现常一起购买的商品组合，用于进行交叉销售和促销。Association rule mining algorithms are used to discover relevant patterns and association rules in data sets to help explain the relationships between data. For example, in the retail industry, association rule mining algorithms can be used to discover combinations of items that are often purchased together for cross-selling and promotion.

文本情感分析模型用于识别和分析文本数据中的情感倾向和情感极性。例如，在社交媒体分析中，可以使用文本情感分析模型来评估用户对某个产品或品牌的情感态度，以指导相应的市场策略。Text sentiment analysis models are used to identify and analyze sentiment tendencies and sentiment polarities in text data. For example, in social media analysis, text sentiment analysis models can be used to evaluate users' emotional attitudes towards a certain product or brand to guide corresponding market strategies.

步骤S16、利用所述数学模型和算法，根据所述分析目标对所述标准化数据和所述数据隐藏关联信息进行数据分析，得到分析结果。Step S16: Use the mathematical model and algorithm to perform data analysis on the standardized data and the data hidden associated information according to the analysis target, and obtain analysis results.

在具体实践中，分析结果是对运营数据进行深入的、全面的分析和洞察，以揭示数据背后的模式、趋势和关联，帮助企业做出更准确的决策和优化运营。In specific practice, the analysis results are in-depth and comprehensive analysis and insights into operational data to reveal the patterns, trends and correlations behind the data, helping enterprises to make more accurate decisions and optimize operations.

分析结果可能包括但不限于以下内容：Analysis results may include but are not limited to the following:

数据特征和趋势：对运营数据的特征和趋势进行分析和描述，如销售额的增长趋势、用户行为的变化等。Data characteristics and trends: Analyze and describe the characteristics and trends of operational data, such as sales growth trends, changes in user behavior, etc.

相关性和关联规则：通过分析数据之间的关联性和关联规则，了解变量之间的相互影响关系，如市场推广费用与销售额之间的关系。Correlation and association rules: By analyzing the correlation and association rules between data, we can understand the mutual influence between variables, such as the relationship between marketing expenses and sales.

预测和预警：利用建立的预测模型，对未来的趋势和结果进行预测，以及根据异常值和规则进行预警。Forecasting and early warning: Use the established forecast model to predict future trends and results, and provide early warning based on outliers and rules.

优化策略和建议：基于分析结果，提出优化策略和建议，帮助企业做出更准确的决策和优化运营。Optimization strategies and suggestions: Based on the analysis results, we propose optimization strategies and suggestions to help enterprises make more accurate decisions and optimize operations.

综上所述，基于大数据和云计算的数据分析指标的分析结果主要是通过对海量数据的处理和分析得到的，以揭示数据背后的模式、趋势和关联，为企业的决策提供更全面、准确的数据支持。To sum up, the analysis results of data analysis indicators based on big data and cloud computing are mainly obtained through the processing and analysis of massive data to reveal the patterns, trends and correlations behind the data, and provide more comprehensive and comprehensive solutions for enterprise decision-making. Backed by accurate data.

需要说明的是，在所述得到分析结果之后，还包括：It should be noted that after obtaining the analysis results, it also includes:

在具体实践中，能够将分析结果通过可视化工具（如仪表盘、报表、图表等）展示给用户，帮助用户理解数据和分析结果，从而做出更准确和科学的决策。可以选择适当的数据可视化工具Tableau、Power BI等展示分析结果，以产生易于理解和透明的可视化报表和图表。优选的，还可以包括自助分析和交互功能：通过图表的交互功能，让用户能够自由探索和分析数据，从而做出更准确和科学的决策。In specific practice, the analysis results can be displayed to users through visual tools (such as dashboards, reports, charts, etc.) to help users understand the data and analysis results, thereby making more accurate and scientific decisions. You can choose appropriate data visualization tools such as Tableau and Power BI to display analysis results to produce easy-to-understand and transparent visual reports and charts. Preferably, it can also include self-service analysis and interactive functions: through the interactive functions of charts, users can freely explore and analyze data to make more accurate and scientific decisions.

可以理解的是，根据本实施例示出的技术方案，能够实现数据采集、预处理、整合、分析、指标计算、模型建立和结果展示等功能，提供高效、准确、可视化的数据分析和决策支持。It can be understood that according to the technical solution shown in this embodiment, functions such as data collection, preprocessing, integration, analysis, index calculation, model establishment, and result display can be realized, and efficient, accurate, and visual data analysis and decision support can be provided.

优选的，可以根据上述技术方案构建基于大数据和云计算的数智化运营数据分析指标模型，从而使得方法的实施更加便捷、有效。数智化运营数据分析指标模型的层状结构参见图2。数智化运营数据分析指标模型的要素，即用户行为转化触点。主要围绕周期和业务目标优先级进行拆解，从目的、时间、激励方式等角度归纳了一些常用的组合形式。基于大数据和云计算的数智化运营数据分析指标模型可以弥补传统技术的局限性，提供更强大、高效、准确和智能的数据分析和决策支持：数据规模扩展性：基于大数据和云计算技术，可以处理海量的结构化和非结构化数据，支持快速、实时的数据分析和决策响应。自动化处理能力：借助机器学习和深度学习技术，可以实现自动的特征提取、模型训练和优化，减轻人工参与的工作量。弹性与灵活性：云计算平台提供弹性的资源分配和扩缩容能力，能够适应不同规模和需求的数据分析任务。实时性和实践性：能够实时处理、分析和学习数据，提供更准确、实用的决策支持。同时，由于不依赖预先设定的模型和规则，可以发现更多的隐藏模式和关联。同时，依据AARRR漏斗模型，数智化运营数据分析指标模型解释了实现用户增长的5个指标，可以帮助企业更好地解释获客和维护客户的原理。Preferably, a digital intelligent operation data analysis index model based on big data and cloud computing can be constructed according to the above technical solution, thereby making the implementation of the method more convenient and effective. The hierarchical structure of the digital intelligent operation data analysis indicator model is shown in Figure 2. The elements of the digital intelligence operation data analysis indicator model are user behavior conversion touch points. It mainly dismantles the cycle and priority of business goals, and summarizes some commonly used combination forms from the perspectives of purpose, time, incentive methods, etc. The digital intelligent operation data analysis indicator model based on big data and cloud computing can make up for the limitations of traditional technology and provide more powerful, efficient, accurate and intelligent data analysis and decision support: Data scale scalability: based on big data and cloud computing technology that can process massive amounts of structured and unstructured data and support fast, real-time data analysis and decision-making response. Automated processing capabilities: With the help of machine learning and deep learning technology, automatic feature extraction, model training and optimization can be achieved, reducing the workload of manual participation. Elasticity and flexibility: The cloud computing platform provides elastic resource allocation and expansion and contraction capabilities, and can adapt to data analysis tasks of different sizes and needs. Real-time and practical: Able to process, analyze and learn data in real time, providing more accurate and practical decision support. At the same time, because it does not rely on preset models and rules, more hidden patterns and correlations can be discovered. At the same time, based on the AARRR funnel model, the digital intelligent operation data analysis indicator model explains the five indicators for achieving user growth, which can help companies better explain the principles of acquiring and maintaining customers.

本实施例公开的技术方案，相较现有技术，可以具有以下效果和优点：Compared with the existing technology, the technical solution disclosed in this embodiment can have the following effects and advantages:

提高数据质量和分析精度：通过数据预处理和清洗，能够自动清洗和修复数据中的错误、丢失和不一致性，从而提高数据质量。同时，采用先进的分析算法和模型，能够发现隐藏的模式和关联，提高分析的精度和准确性。Improve data quality and analysis accuracy: Through data preprocessing and cleaning, errors, losses, and inconsistencies in data can be automatically cleaned and repaired, thereby improving data quality. At the same time, advanced analysis algorithms and models can be used to discover hidden patterns and correlations, improving the precision and accuracy of analysis.

实现高效的数据处理和分析：利用云计算平台的并行计算和分布式存储能力，能够快速处理大规模数据和实现实时分析。在数据采集、数据预处理、模型训练和指标计算等环节，能够并行处理和优化计算任务，提高效率和性能。Achieve efficient data processing and analysis: Using the parallel computing and distributed storage capabilities of the cloud computing platform, large-scale data can be quickly processed and real-time analysis can be achieved. In aspects such as data collection, data preprocessing, model training and indicator calculation, computing tasks can be processed and optimized in parallel to improve efficiency and performance.

简化操作和提升用户体验：基于大数据和云计算的数智化运营数据分析指标模型软件设计用户友好的交互界面，通过可视化工具和自助分析功能，简化了用户的操作流程和数据分析过程。用户可以通过简单的操作，轻松地进行数据查询、指标计算和分析展示，提升用户的体验和效率。Simplify operations and improve user experience: The digital intelligent operation data analysis indicator model software based on big data and cloud computing designs a user-friendly interactive interface and simplifies the user's operation process and data analysis process through visual tools and self-service analysis functions. Users can easily perform data query, indicator calculation and analysis display through simple operations, improving user experience and efficiency.

提供灵活的决策支持：基于大数据和云计算的数智化运营数据分析指标模型软件不依赖于预设的规则和逻辑，能够灵活地处理多样化的分析需求和决策场景。通过机器学习和深度学习技术，能够学习和优化模型，从数据中发现更多的关联和模式，提供更准确、智能的决策支持。Provide flexible decision support: The digital intelligent operation data analysis indicator model software based on big data and cloud computing does not rely on preset rules and logic, and can flexibly handle diverse analysis needs and decision-making scenarios. Through machine learning and deep learning technology, we can learn and optimize models, discover more associations and patterns from data, and provide more accurate and intelligent decision support.

资源节省和成本降低：利用云计算平台的弹性和自动化特性，可以根据实际需求进行资源分配和扩缩容，并按照实际使用的计算、存储和网络等资源付费，降低了硬件设备和人力资源的投入成本。Resource saving and cost reduction: Utilizing the elasticity and automation features of the cloud computing platform, resources can be allocated, expanded and reduced based on actual needs, and payment can be made based on the actually used computing, storage, network and other resources, reducing the cost of hardware equipment and human resources. Input costs.

可以带来以下效果：Can bring the following effects:

深入洞察：通过分析大数据，可以深入了解运营数据的特征、趋势和模式，揭示数据背后的规律和关联，帮助企业更好地了解市场、用户和竞争环境。In-depth insights: By analyzing big data, you can gain an in-depth understanding of the characteristics, trends and patterns of operational data, reveal the patterns and correlations behind the data, and help companies better understand the market, users and competitive environment.

准确预测：基于建立的模型，可以对未来的趋势和结果进行准确预测，帮助企业做出合理的规划和决策。例如，通过销售数据的分析，可以预测未来的销售额和需求变化。Accurate prediction: Based on the established model, future trends and results can be accurately predicted to help enterprises make reasonable planning and decisions. For example, through the analysis of sales data, future sales and demand changes can be predicted.

优化决策：通过对运营数据的数据分析和挖掘，可以为企业提供优化决策的依据。通过了解数据中的关联性和规律，企业可以制定更合理的市场推广策略、产品定价策略等。Optimized decision-making: Through data analysis and mining of operational data, enterprises can be provided with the basis for optimized decision-making. By understanding the correlations and patterns in the data, companies can formulate more reasonable marketing strategies, product pricing strategies, etc.

发现机会和风险：通过分析数据，可以发现潜在的机会和风险。例如，通过市场数据的分析，可以识别新兴市场和潜在客户群体，为企业拓展业务提供指导。Discover opportunities and risks: By analyzing data, potential opportunities and risks can be discovered. For example, through the analysis of market data, emerging markets and potential customer groups can be identified, providing guidance for companies to expand their business.

实时监测与调整：利用云计算和实时数据处理，可以实时监测运营数据，及时发现异常情况，并进行快速的决策调整。这有助于企业在竞争激烈的市场中保持敏捷性和竞争优势。Real-time monitoring and adjustment: Using cloud computing and real-time data processing, operational data can be monitored in real time, abnormal situations can be discovered in a timely manner, and rapid decision-making and adjustments can be made. This helps businesses stay agile and competitive in a highly competitive market.

为实现这些效果，数智化运营数据分析指标模型和产品结构需要相互配合，包括以下几个方面：In order to achieve these effects, the digital intelligent operation data analysis indicator model and product structure need to cooperate with each other, including the following aspects:

数据集成和处理：通过云计算和大数据技术，将分散的、异构的数据源进行集成，并进行规范化和清洗，以使数据的可分析性和可用性得到保障。Data integration and processing: Through cloud computing and big data technology, scattered and heterogeneous data sources are integrated, standardized and cleaned to ensure the analyzability and availability of data.

模型开发和应用：基于业务需求，建立适合的数据分析模型，包括机器学习模型、统计分析模型、预测模型等。同时，将模型应用到实际数据中进行分析和预测。Model development and application: Based on business needs, establish suitable data analysis models, including machine learning models, statistical analysis models, prediction models, etc. At the same time, the model is applied to actual data for analysis and prediction.

结果可视化和报告：通过可视化技术，将分析结果以图表、报告等形式呈现给决策者，提供直观的数据驱动决策支持。这有助于决策者快速理解分析结果，并做出相应决策。Results visualization and reporting: Through visualization technology, analysis results are presented to decision makers in the form of charts, reports, etc., providing intuitive data-driven decision support. This helps decision-makers quickly understand the analysis results and make appropriate decisions.

实时监测和反馈：借助云计算和实时数据处理平台，实现对运营数据的实时监测和反馈。及时发现存在的问题和机会，并及时调整决策和战略。Real-time monitoring and feedback: With the help of cloud computing and real-time data processing platform, real-time monitoring and feedback of operational data are achieved. Discover existing problems and opportunities in a timely manner and adjust decisions and strategies in a timely manner.

在实际应用场景中，以电商平台为例进行说明：In actual application scenarios, take the e-commerce platform as an example to illustrate:

数据收集和清洗：电商平台收集了用户行为数据，包括购买记录、浏览行为、搜索历史等。这些数据是通过用户在电商平台的交互行为产生的，并进行清洗和预处理，以确保数据的准确性和一致性。Data collection and cleaning: E-commerce platforms collect user behavior data, including purchase records, browsing behavior, search history, etc. These data are generated through user interactions on the e-commerce platform, and are cleaned and preprocessed to ensure data accuracy and consistency.

数据存储和处理：电商平台将收集到的用户行为数据存储到云计算平台的大数据存储系统中，以利用分布式存储和计算能力进行数据处理。通过云计算平台，可以高效地处理大规模的用户行为数据。Data storage and processing: The e-commerce platform stores the collected user behavior data in the big data storage system of the cloud computing platform to utilize distributed storage and computing capabilities for data processing. Through cloud computing platforms, large-scale user behavior data can be efficiently processed.

模型开发和应用：基于用户行为数据，电商平台可以建立分类模型来识别不同类型的用户，如优质客户、沉睡客户和新客户。模型可以使用机器学习算法进行训练，并根据用户的购买频率、购买金额、活跃度等特征进行分类。Model development and application: Based on user behavior data, e-commerce platforms can build classification models to identify different types of users, such as high-quality customers, sleeping customers and new customers. Models can be trained using machine learning algorithms and classified based on users’ purchase frequency, purchase amount, activity and other characteristics.

结果分析和优化策略：利用建立的分类模型，电商平台可以对不同类型的用户进行分析和优化策略制定。例如，对于优质客户，可以提供个性化的优惠权益，以提升他们的忠诚度；对于沉睡客户，可以通过推送特定活动或优惠券来唤醒他们的购买兴趣；对于新客户，可以采取拉新优惠措施并着力提升留存率。Result analysis and optimization strategies: Using the established classification model, the e-commerce platform can analyze and formulate optimization strategies for different types of users. For example, for high-quality customers, you can provide personalized preferential benefits to enhance their loyalty; for sleeping customers, you can push specific activities or coupons to awaken their interest in purchasing; for new customers, you can take new promotion measures And focus on improving retention rates.

结果可视化和报告：为了使决策者更好地理解分析结果，电商平台可以使用数据可视化技术，将不同类型用户的分析结果以图表、报告的形式展示出来。这些可视化结果包括不同类型用户的数量、转化率、复购率等指标，以及针对不同类型用户的优化策略建议。Results visualization and reporting: In order to enable decision-makers to better understand the analysis results, e-commerce platforms can use data visualization technology to display the analysis results of different types of users in the form of charts and reports. These visual results include the number of different types of users, conversion rates, repurchase rates and other indicators, as well as optimization strategy suggestions for different types of users.

实施例二Embodiment 2

提供一种基于大数据的运营数据分析系统，包括：Provide an operational data analysis system based on big data, including:

可以理解的是，本实施例提供的技术方案，能够接收用户输入的分析目标，自动采集原始数据并存储，进行数据修复或数据清洗，得到二级数据，将其整合到预设标准化数据模型中，并进行标准化处理，得到标准化数据；利用预设大数据技术和机器学习算法对标准化数据进行数据分析和挖掘，得到数据隐藏关联信息；根据分析目标，设计并计算出关键指标，建立与关键指标相关的数学模型和算法；利用数学模型和算法，进行数据分析，得到分析结果。本发明示出的技术方案，能够自动清洗和修复数据中的错误、丢失和不一致性，从而提高数据质量。同时，采用大数据技术和机器学习算法，能够发现隐藏的模式和关联，提高分析的精度和准确性。不依赖于预设的规则和逻辑，能够灵活地处理多样化的分析需求和决策场景，通过机器学习技术，从数据中发现更多的关联和模式，并建立相关模型，从而得到更准确、智能的分析结果。It can be understood that the technical solution provided by this embodiment can receive the analysis target input by the user, automatically collect and store the original data, perform data repair or data cleaning, obtain secondary data, and integrate it into the preset standardized data model. , and carry out standardization processing to obtain standardized data; use preset big data technology and machine learning algorithms to conduct data analysis and mining of standardized data to obtain data hidden related information; design and calculate key indicators based on the analysis goals, and establish key indicators Relevant mathematical models and algorithms; use mathematical models and algorithms to conduct data analysis and obtain analysis results. The technical solution shown in the present invention can automatically clean and repair errors, losses and inconsistencies in data, thereby improving data quality. At the same time, the use of big data technology and machine learning algorithms can discover hidden patterns and associations, improving the precision and accuracy of analysis. It does not rely on preset rules and logic, and can flexibly handle diverse analysis needs and decision-making scenarios. Through machine learning technology, it can discover more associations and patterns from the data and establish relevant models to obtain more accurate and intelligent results. analysis results.

需要说明的是，所述的系统，还包括：It should be noted that the system also includes:

可以理解的是，利用云计算平台的弹性和自动化特性，可以根据实际需求进行资源分配和扩缩容，并按照实际使用的计算、存储和网络等资源付费，降低了硬件设备和人力资源的投入成本。It is understandable that by taking advantage of the elasticity and automation features of the cloud computing platform, resources can be allocated and expanded or reduced according to actual needs, and payment can be made according to the actually used computing, storage, network and other resources, reducing the investment in hardware equipment and human resources. cost.

需要说明的是，所述数据分析模块、所述数据采集模块、所述云存储模块、所述数据预处理模块、所述数据集成标准化模块、所述数据分析挖掘模块和所述指标计算模型建立模块之间通过云计算平台提供的服务和通信机制进行连接和数据传输。It should be noted that the data analysis module, the data acquisition module, the cloud storage module, the data preprocessing module, the data integration standardization module, the data analysis mining module and the indicator calculation model are established The modules are connected and data transmitted through the services and communication mechanisms provided by the cloud computing platform.

可以理解的是，上述各实施例中相同或相似部分可以相互参考，在一些实施例中未详细说明的内容可以参见其他实施例中相同或相似的内容。It can be understood that the same or similar parts in the above-mentioned embodiments can be referred to each other, and the content that is not described in detail in some embodiments can be referred to the same or similar content in other embodiments.

需要说明的是，在本发明的描述中，术语“第一”、“第二”等仅用于描述目的，而不能理解为指示或暗示相对重要性。此外，在本发明的描述中，除非另有说明，“多个”的含义是指至少两个。It should be noted that in the description of the present invention, the terms "first", "second", etc. are only used for description purposes and cannot be understood as indicating or implying relative importance. Furthermore, in the description of the present invention, unless otherwise stated, the meaning of "plurality" means at least two.

流程图中或在此以其他方式描述的任何过程或方法描述可以被理解为，表示包括一个或更多个用于实现特定逻辑功能或过程的步骤的可执行指令的代码的模块、片段或部分，并且本发明的优选实施方式的范围包括另外的实现，其中可以不按所示出或讨论的顺序，包括根据所涉及的功能按基本同时的方式或按相反的顺序，来执行功能，这应被本发明的实施例所属技术领域的技术人员所理解。Any process or method descriptions in flowcharts or otherwise described herein may be understood to represent modules, segments, or portions of code that include one or more executable instructions for implementing the specified logical functions or steps of the process. , and the scope of the preferred embodiments of the invention includes additional implementations in which functions may be performed out of the order shown or discussed, including in a substantially simultaneous manner or in the reverse order, depending on the functionality involved, which shall It should be understood by those skilled in the art to which embodiments of the present invention belong.

应当理解，本发明的各部分可以用硬件、软件、固件或它们的组合来实现。在上述实施方式中，多个步骤或方法可以用存储在存储器中且由合适的指令执行系统执行的软件或固件来实现。例如，如果用硬件来实现，和在另一实施方式中一样，可用本领域公知的下列技术中的任一项或他们的组合来实现：具有用于对数据信号实现逻辑功能的逻辑门电路的离散逻辑电路，具有合适的组合逻辑门电路的专用集成电路，可编程门阵列（PGA），现场可编程门阵列（FPGA）等。It should be understood that various parts of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if it is implemented in hardware, as in another embodiment, it can be implemented by any one or a combination of the following technologies known in the art: a logic gate circuit with a logic gate circuit for implementing a logic function on a data signal. Discrete logic circuits, application specific integrated circuits with suitable combinational logic gates, programmable gate arrays (PGA), field programmable gate arrays (FPGA), etc.

本技术领域的普通技术人员可以理解实现上述实施例方法携带的全部或部分步骤是可以通过程序来指令相关的硬件完成，所述的程序可以存储于一种计算机可读存储介质中，该程序在执行时，包括方法实施例的步骤之一或其组合。Those of ordinary skill in the art can understand that all or part of the steps involved in implementing the methods of the above embodiments can be completed by instructing relevant hardware through a program. The program can be stored in a computer-readable storage medium. The program can be stored in a computer-readable storage medium. When executed, one of the steps of the method embodiment or a combination thereof is included.

此外，在本发明各个实施例中的各功能单元可以集成在一个处理模块中，也可以是各个单元单独物理存在，也可以两个或两个以上单元集成在一个模块中。上述集成的模块既可以采用硬件的形式实现，也可以采用软件功能模块的形式实现。所述集成的模块如果以软件功能模块的形式实现并作为独立的产品销售或使用时，也可以存储在一个计算机可读取存储介质中。In addition, each functional unit in various embodiments of the present invention can be integrated into a processing module, or each unit can exist physically alone, or two or more units can be integrated into one module. The above integrated modules can be implemented in the form of hardware or software function modules. If the integrated module is implemented in the form of a software function module and sold or used as an independent product, it can also be stored in a computer-readable storage medium.

上述提到的存储介质可以是只读存储器，磁盘或光盘等。The storage media mentioned above can be read-only memory, magnetic disks or optical disks, etc.

在本说明书的描述中，参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本发明的至少一个实施例或示例中。在本说明书中，对上述术语的示意性表述不一定指的是相同的实施例或示例。而且，描述的具体特征、结构、材料或者特点可以在任何的一个或多个实施例或示例中以合适的方式结合。In the description of this specification, reference to the terms "one embodiment," "some embodiments," "an example," "specific examples," or "some examples" or the like means that specific features are described in connection with the embodiment or example. , structures, materials or features are included in at least one embodiment or example of the invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the specific features, structures, materials or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

尽管上面已经示出和描述了本发明的实施例，可以理解的是，上述实施例是示例性的，不能理解为对本发明的限制，本领域的普通技术人员在本发明的范围内可以对上述实施例进行变化、修改、替换和变型。Although the embodiments of the present invention have been shown and described above, it can be understood that the above-mentioned embodiments are illustrative and should not be construed as limitations of the present invention. Those of ordinary skill in the art can make modifications to the above-mentioned embodiments within the scope of the present invention. The embodiments are subject to changes, modifications, substitutions and variations.

Claims

1. An operational data analysis method based on big data, comprising:

receiving an analysis target input by a user, and automatically collecting and storing original data related to the analysis target;

performing data restoration or data cleaning on the original data to obtain secondary data;

integrating the secondary data into a preset standardized data model, and carrying out standardized processing on the secondary data to obtain standardized data;

carrying out data analysis and mining on the standardized data by using a preset big data technology and a machine learning algorithm to obtain data hiding associated information;

according to the analysis target, utilizing the standardized data and the data hiding associated information to design and calculate key indexes, and establishing a mathematical model and algorithm related to the key indexes;

and carrying out data analysis on the standardized data and the data hiding associated information according to the analysis target by using the mathematical model and the algorithm to obtain an analysis result.

2. The method of claim 1, further comprising, after the obtaining the analysis result:

and generating a data visualization interface according to the analysis result by utilizing a data visualization technology.

3. The method of claim 1, wherein the automatically collecting raw data related to the analysis target comprises:

and deploying data acquisition agents on different data sources, and acquiring the original data related to the analysis target by using a data acquisition interface.

4. The method of claim 1, wherein the automatically collecting and storing raw data related to the analysis target comprises:

and storing the collected original data related to the analysis target by using a storage service provided by the cloud computing platform.

5. The method of claim 1, wherein the data analysis and mining of the standardized data using pre-set big data techniques and machine learning algorithms comprises:

extracting features of the standardized data and the data hiding associated information by using the machine learning algorithm, and converting the standardized data and the data hiding associated information into feature vectors;

and carrying out pattern recognition, classification, clustering and association rule mining on the feature vectors to obtain the data hiding association information.

6. The method of claim 1, wherein the mathematical model and algorithm established comprises:

one or more of a linear regression model, a cluster analysis model, a text emotion analysis model, a random forest algorithm, or an association rule mining algorithm.

7. An operational data analysis system based on big data, comprising:

the data analysis module is used for receiving an analysis target input by a user;

the data acquisition module is used for automatically acquiring and storing the original data related to the analysis target;

the data preprocessing module is used for carrying out data restoration or data cleaning on the original data to obtain secondary data;

the data integration standardization module is used for integrating the secondary data into a preset standardization data model and carrying out standardization processing on the secondary data to obtain standardization data;

the data analysis mining module is used for carrying out data analysis and mining on the standardized data by utilizing a preset big data technology and a machine learning algorithm to obtain data hiding associated information;

the index calculation model building module is used for designing and calculating key indexes by utilizing the standardized data and the data hiding associated information according to the analysis target, and building a mathematical model and algorithm related to the key indexes;

the data analysis module is further used for carrying out data analysis on the standardized data and the data hiding associated information according to the analysis target by utilizing the mathematical model and the algorithm to obtain an analysis result.

8. The system of claim 7, further comprising:

and the cloud storage module is used for storing the collected original data related to the analysis target by using a storage service provided by the cloud computing platform.

9. The system of claim 8, wherein the system further comprises a controller configured to control the controller,

the data analysis module, the data acquisition module, the cloud storage module, the data preprocessing module, the data integration standardization module, the data analysis mining module and the index calculation model building module are connected and data transmission is carried out through a service and communication mechanism provided by a cloud calculation platform.