CN117076521A - Operational data analysis method and system based on big data - Google Patents
Operational data analysis method and system based on big data Download PDFInfo
- Publication number
- CN117076521A CN117076521A CN202311087003.9A CN202311087003A CN117076521A CN 117076521 A CN117076521 A CN 117076521A CN 202311087003 A CN202311087003 A CN 202311087003A CN 117076521 A CN117076521 A CN 117076521A
- Authority
- CN
- China
- Prior art keywords
- data
- analysis
- standardized
- module
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000007405 data analysis Methods 0.000 title claims abstract description 74
- 238000000034 method Methods 0.000 title claims abstract description 46
- 238000004458 analytical method Methods 0.000 claims abstract description 102
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 54
- 238000005516 engineering process Methods 0.000 claims abstract description 30
- 238000010801 machine learning Methods 0.000 claims abstract description 25
- 238000012545 processing Methods 0.000 claims abstract description 22
- 238000013178 mathematical model Methods 0.000 claims abstract description 21
- 238000004140 cleaning Methods 0.000 claims abstract description 16
- 238000007418 data mining Methods 0.000 claims abstract description 16
- 238000013499 data model Methods 0.000 claims abstract description 10
- 238000004364 calculation method Methods 0.000 claims description 16
- 238000005065 mining Methods 0.000 claims description 16
- 238000007781 pre-processing Methods 0.000 claims description 9
- 238000013079 data visualisation Methods 0.000 claims description 8
- 230000008451 emotion Effects 0.000 claims description 8
- 230000010354 integration Effects 0.000 claims description 8
- 239000013598 vector Substances 0.000 claims description 7
- 238000007621 cluster analysis Methods 0.000 claims description 6
- 238000012417 linear regression Methods 0.000 claims description 5
- 238000007637 random forest analysis Methods 0.000 claims description 5
- 239000003795 chemical substances by application Substances 0.000 claims description 4
- 230000005540 biological transmission Effects 0.000 claims description 3
- 238000004891 communication Methods 0.000 claims description 3
- 238000013461 design Methods 0.000 claims description 3
- 230000007246 mechanism Effects 0.000 claims description 3
- 238000003909 pattern recognition Methods 0.000 claims description 3
- 230000006399 behavior Effects 0.000 description 8
- 230000008569 process Effects 0.000 description 8
- 230000006870 function Effects 0.000 description 7
- 238000005457 optimization Methods 0.000 description 6
- 238000007619 statistical method Methods 0.000 description 5
- 238000012800 visualization Methods 0.000 description 5
- 230000002159 abnormal effect Effects 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 238000012544 monitoring process Methods 0.000 description 4
- 230000000007 visual effect Effects 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 230000008602 contraction Effects 0.000 description 3
- 238000003066 decision tree Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 230000002452 interceptive effect Effects 0.000 description 3
- 230000002829 reductive effect Effects 0.000 description 3
- 230000008439 repair process Effects 0.000 description 3
- 238000013468 resource allocation Methods 0.000 description 3
- 238000003491 array Methods 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000013145 classification model Methods 0.000 description 2
- 230000002860 competitive effect Effects 0.000 description 2
- 238000007796 conventional method Methods 0.000 description 2
- 238000013480 data collection Methods 0.000 description 2
- 238000013500 data storage Methods 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 241000533950 Leucojum Species 0.000 description 1
- 208000025174 PANDAS Diseases 0.000 description 1
- 208000021155 Paediatric autoimmune neuropsychiatric disorders associated with streptococcal infection Diseases 0.000 description 1
- 240000000220 Panda oleosa Species 0.000 description 1
- 235000016496 Panda oleosa Nutrition 0.000 description 1
- 238000000540 analysis of variance Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013075 data extraction Methods 0.000 description 1
- 238000013523 data management Methods 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 230000005284 excitation Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000000670 limiting effect Effects 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000013439 planning Methods 0.000 description 1
- 238000010223 real-time analysis Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000000611 regression analysis Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000002441 reversible effect Effects 0.000 description 1
- 238000012706 support-vector machine Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2465—Query processing support for facilitating data mining operations in structured databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2216/00—Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
- G06F2216/03—Data mining
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Probability & Statistics with Applications (AREA)
- Fuzzy Systems (AREA)
- Quality & Reliability (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Computing Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to the technical field of data analysis, in particular to an operation data analysis method and system based on big data, which can receive an analysis target input by a user, automatically collect and store original data, perform data restoration or data cleaning to obtain secondary data, integrate the secondary data into a preset standardized data model, and perform standardized processing to obtain standardized data; carrying out data analysis and mining on the standardized data by utilizing a preset big data technology and a machine learning algorithm to obtain data hiding associated information; according to the analysis target, designing and calculating key indexes, and establishing a mathematical model and algorithm related to the key indexes; and (3) carrying out data analysis by using a mathematical model and an algorithm to obtain an analysis result. According to the technical scheme, more correlations and modes are found from the data through a machine learning technology, and a correlation model is built, so that more accurate and intelligent analysis results are obtained.
Description
Technical Field
The invention relates to the technical field of data analysis, in particular to an operation data analysis method and system based on big data.
Background
Operation data analysis refers to deep, comprehensive analysis and insight into operation data to reveal patterns, trends and associations behind the data, helping enterprises make more accurate decisions and optimize operation.
The existing operation Data analysis includes Data Warehouse and business intelligence (DW/BI), data Mining, statistical analysis and the like. Data warehouse and business intelligence: the DW/BI technology mainly relates to ETL flow of data Extraction (Extraction), transformation (Transformation) and Loading (Loading), and integrates data of various data sources of an enterprise into a centralized data warehouse. Then, data analysis is performed through a multidimensional data model (such as a star model, a snowflake model and the like), and various reports and query results are rapidly generated by utilizing an OLAP (on-line analytical processing) technology. And (3) data mining: based on statistics and machine learning algorithm, the hidden mode and the association information are found from the large-scale data through methods such as mode identification, classification, clustering, association rule mining and the like; common algorithms include decision trees, neural networks, support vector machines, cluster analysis, and the like. Statistical analysis: based on statistical theory and methods, the characteristics, relationships, or differences of the population are inferred from the sample data by parameter estimation, hypothesis testing, analysis of variance, regression analysis, and the like.
However, existing operational data analysis needs to face the following problems:
data volume and speed problems: conventional techniques have difficulty in handling large-scale data and rapidly changing real-time data. When the data volume is too large, conventional data warehouse and ETL techniques cannot be efficiently processed and stored, resulting in performance degradation. Meanwhile, the traditional statistical analysis and data mining method needs to define a model and an algorithm in advance, and cannot adapt to new data and situations quickly. Data quality problem: conventional techniques often fail to address data quality issues such as data loss, duplication, inconsistencies, etc., and thus data analysis results may be inaccurate or biased. Moreover, data warehouse and business intelligence techniques have strong constraints on the structure and format of data, and are unable to handle unstructured and semi-structured data. Decision limitation problem: the traditional technology mainly relies on manual experience and set rule logic, and cannot fully exert the potential of data. Traditional data mining and statistical analysis methods often rely too much on feature engineering and fine-tuning, ignoring the information and relevance inherent in the data.
Therefore, the existing operation data analysis mode has a certain limitation, mainly depends on manual experience and set rule logic, and cannot fully exert the potential of data.
Disclosure of Invention
In view of the above, the present invention aims to provide a method and a system for analyzing operation data based on big data, so as to solve the problems that the operation data analysis method in the prior art has a certain limitation, mainly depends on manual experience and set rule logic, and cannot fully exert the potential of data.
According to a first aspect of an embodiment of the present invention, there is provided an operation data analysis method based on big data, including:
receiving an analysis target input by a user, and automatically collecting and storing original data related to the analysis target;
performing data restoration or data cleaning on the original data to obtain secondary data;
integrating the secondary data into a preset standardized data model, and carrying out standardized processing on the secondary data to obtain standardized data;
carrying out data analysis and mining on the standardized data by using a preset big data technology and a machine learning algorithm to obtain data hiding associated information;
according to the analysis target, utilizing the standardized data and the data hiding associated information to design and calculate key indexes, and establishing a mathematical model and algorithm related to the key indexes;
and carrying out data analysis on the standardized data and the data hiding associated information according to the analysis target by using the mathematical model and the algorithm to obtain an analysis result.
Preferably, after the analysis result is obtained, the method further comprises:
and generating a data visualization interface according to the analysis result by utilizing a data visualization technology.
Preferably, the automatically collecting the raw data related to the analysis target includes:
and deploying data acquisition agents on different data sources, and acquiring the original data related to the analysis target by using a data acquisition interface.
Preferably, the automatically collecting and storing raw data related to the analysis target includes:
and storing the collected original data related to the analysis target by using a storage service provided by the cloud computing platform.
Preferably, the data analysis and mining of the standardized data using a preset big data technique and a machine learning algorithm includes:
extracting features of the standardized data and the data hiding associated information by using the machine learning algorithm, and converting the standardized data and the data hiding associated information into feature vectors;
and carrying out pattern recognition, classification, clustering and association rule mining on the feature vectors to obtain the data hiding association information.
Preferably, the mathematical model and algorithm established comprises:
one or more of a linear regression model, a cluster analysis model, a text emotion analysis model, a random forest algorithm, or an association rule mining algorithm.
According to a second aspect of the embodiment of the present invention, there is provided an operation data analysis system based on big data, including:
the data analysis module is used for receiving an analysis target input by a user;
the data acquisition module is used for automatically acquiring and storing the original data related to the analysis target;
the data preprocessing module is used for carrying out data restoration or data cleaning on the original data to obtain secondary data;
the data integration standardization module is used for integrating the secondary data into a preset standardization data model and carrying out standardization processing on the secondary data to obtain standardization data;
the data analysis mining module is used for carrying out data analysis and mining on the standardized data by utilizing a preset big data technology and a machine learning algorithm to obtain data hiding associated information;
the index calculation model building module is used for designing and calculating key indexes by utilizing the standardized data and the data hiding associated information according to the analysis target, and building a mathematical model and algorithm related to the key indexes;
the data analysis module is further used for carrying out data analysis on the standardized data and the data hiding associated information according to the analysis target by utilizing the mathematical model and the algorithm to obtain an analysis result.
Preferably, the system further comprises:
and the cloud storage module is used for storing the collected original data related to the analysis target by using a storage service provided by the cloud computing platform.
Preferably, the data analysis module, the data acquisition module, the cloud storage module, the data preprocessing module, the data integration standardization module, the data analysis mining module and the index calculation model building module are connected and data transmission is performed through a service and communication mechanism provided by a cloud calculation platform.
The technical scheme provided by the embodiment of the invention can comprise the following beneficial effects:
it can be understood that the technical scheme provided by the invention can receive the analysis target input by the user, automatically collect and store the original data, perform data restoration or data cleaning to obtain the second-level data, integrate the second-level data into a preset standardized data model, and perform standardized processing to obtain standardized data; carrying out data analysis and mining on the standardized data by utilizing a preset big data technology and a machine learning algorithm to obtain data hiding associated information; according to the analysis target, designing and calculating key indexes, and establishing a mathematical model and algorithm related to the key indexes; and (3) carrying out data analysis by using a mathematical model and an algorithm to obtain an analysis result. The technical scheme provided by the invention can automatically clean and repair errors, losses and inconsistencies in data, thereby improving the data quality. Meanwhile, a big data technology and a machine learning algorithm are adopted, so that hidden modes and association can be found, and the analysis precision and accuracy are improved. The method is independent of preset rules and logics, can flexibly process diversified analysis requirements and decision scenes, discovers more associations and modes from data through a machine learning technology, and establishes a related model, so that more accurate and intelligent analysis results are obtained.
Compared with the traditional technology, the method has the advantages of improving data quality and analysis precision, efficiently processing and analyzing data, simplifying operation, improving user experience, flexibly supporting decision and the like, and the technical scheme can better support data-driven decision of enterprises and improve decision effect.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention as claimed.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.
FIG. 1 is a schematic diagram illustrating steps of a big data based operational data analysis method, according to an exemplary embodiment;
fig. 2 is a diagram illustrating a digital intelligent operation data analysis index model construction according to an exemplary embodiment.
Detailed Description
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the invention. Rather, they are merely examples of apparatus and methods consistent with aspects of the invention as detailed in the accompanying claims.
Example 1
Fig. 1 is a schematic step diagram of an operation data analysis method based on big data, and referring to fig. 1, there is provided an operation data analysis method based on big data, including:
s11, receiving an analysis target input by a user, automatically collecting and storing original data related to the analysis target;
it should be noted that the automatic acquisition of the raw data related to the analysis target includes:
and deploying data acquisition agents on different data sources, and acquiring the original data related to the analysis target by using a data acquisition interface.
In particular practice, raw data may be collected from various data sources (e.g., mobile device terminals, applications, social media, etc.), and data collection agents may be deployed on different data sources, using device interfaces, application program interfaces, etc., for collecting raw data.
It should be noted that the automatic acquisition and storage of the raw data related to the analysis target includes:
and storing the collected original data related to the analysis target by using a storage service provided by the cloud computing platform.
In particular practice, data may be stored using storage services (e.g., object storage, databases, etc.) provided by the cloud computing platform. Cloud storage devices may be built: the system is deployed in a reliable data center or machine room by utilizing hardware resources such as servers, storage devices, network devices and the like. At the same time, the network architecture and bandwidth need to be planned to ensure rapid transmission and access of data. A distributed storage system is configured. And formulating a data management strategy comprising data classification, data archiving, data backup and the like. The safety and the integrity of the data are ensured, and the data are backed up to prevent the data from being lost. And establishing a monitoring and managing system for tracking indexes such as performance, availability and the like of the storage system in real time.
Step S12, carrying out data restoration or data cleaning on the original data to obtain secondary data;
in specific practice, data repair or data cleaning is performed on the data, so that the data quality can be improved, and meanwhile, the data consistency is ensured. The data cleaning includes operations such as removing duplicate data, filling missing values, processing abnormal values, converting data formats, etc., so as to improve accuracy and reliability of the data, the collected data can be cleaned by using a data cleaning technology PySpark, pandas, etc., the converted data formats in the data cleaning can be converted according to analysis targets, and data of different data sources and data types can be standardized and formatted.
S13, integrating the secondary data into a preset standardized data model, and carrying out standardized processing on the secondary data to obtain standardized data;
in specific practice, after the secondary data is subjected to standardization (or isomorphism) processing, subsequent data analysis and processing can be facilitated. The data of different data sources can be integrated into a centralized data warehouse by using ETL (data warehouse technology) tools Apache Airflow, talend and the like, and data standardization processing is performed so as to ensure the consistency and comparability of the data.
S14, carrying out data analysis and mining on the standardized data by using a preset big data technology and a machine learning algorithm to obtain data hiding associated information;
the machine learning algorithm can be used for extracting features of the standardized data and the data hiding associated information, and converting the standardized data and the data hiding associated information into feature vectors; and carrying out pattern recognition, classification, clustering and association rule mining on the feature vectors to obtain the data hiding association information. The data hiding associated information at least comprises a hidden mode, associated information and abnormal information. Preferably, according to specific analysis targets, proper machine learning algorithms, model decision trees, neural networks and the like can be selected to train and optimize the feature vectors of the data so as to build accurate and robust models, thereby carrying out data analysis and mining.
S15, designing and calculating key indexes by utilizing the standardized data and the data hiding associated information according to the analysis target, and establishing a mathematical model and algorithm related to the key indexes;
in specific practice, the key indicators may be KPIs (key performance indicators), KRIs (key risk indicators), etc., and the indicator calculation logic may be implemented using SQL language or programming language.
It should be noted that, the established mathematical model and algorithm include:
one or more of a linear regression model, a cluster analysis model, a text emotion analysis model, a random forest algorithm, or an association rule mining algorithm.
The linear regression model is used for analyzing the linear relation between the independent variable and the dependent variable, and predicting and deducing. For example, in the field of electronic commerce, a linear regression model may be used to analyze the relationship between advertisement placement and sales and predict the appropriate advertisement placement based on the model.
The random forest algorithm is an integrated learning algorithm, and has good prediction accuracy and robustness by constructing a plurality of decision tree models and combining final prediction results. For example, in the financial wind control field, random forest algorithms may be used to assess credit risk of loan applications.
The cluster analysis model is used for grouping the data samples into similar clusters and classifying and analyzing the data according to the differences among the clusters. For example, in marketing, a cluster analysis model may be used to divide customers into different groups and formulate a corresponding marketing strategy for each group.
The association rule mining algorithm is used to discover the association patterns and association rules in the dataset and help explain the relationships between the data. For example, in the retail industry, association rule mining algorithms may be used to find commonly purchased combinations of items for cross-selling and promotion.
The text emotion analysis model is used for identifying and analyzing emotion tendencies and emotion polarities in text data. For example, in social media analysis, textual emotion analysis models may be used to evaluate the emotion attitudes of users to a certain product or brand to guide the corresponding marketing strategy.
And S16, carrying out data analysis on the standardized data and the data hiding associated information according to the analysis target by utilizing the mathematical model and the algorithm to obtain an analysis result.
In specific practice, the analysis results are deep, comprehensive analysis and insight into the operational data to reveal patterns, trends and associations behind the data, helping businesses make more accurate decisions and optimize operation.
The analysis results may include, but are not limited to, the following:
data characteristics and trends: characteristics and trends of the operation data, such as increasing trends of sales, changes of user behaviors, etc., are analyzed and described.
Relevance and association rules: by analyzing the relevance and the relevance rule between the data, the mutual influence relationship between the variables, such as the relationship between the marketing expense and sales, is known.
Predicting and early warning: and predicting future trends and results by using the established prediction model, and carrying out early warning according to abnormal values and rules.
Optimization strategies and suggestions: based on the analysis result, an optimization strategy and suggestion are provided to help enterprises make more accurate decisions and optimize operation.
In summary, the analysis result of the data analysis index based on big data and cloud computing is mainly obtained by processing and analyzing massive data, so as to reveal the mode, trend and association behind the data, and provide more comprehensive and accurate data support for the decision of enterprises.
It should be noted that, after the analysis result is obtained, the method further includes:
and generating a data visualization interface according to the analysis result by utilizing a data visualization technology.
In specific practice, the analysis results can be presented to the user through visualization tools (such as instrument panels, reports, charts, etc.), which help the user understand the data and analysis results, thereby making more accurate and scientific decisions. The appropriate data visualization tools tab, power BI, etc. may be selected to present the analysis results to produce easily understood and transparent visual reports and charts. Preferably, self-help analysis and interaction functions can also be included: through the interactive function of the chart, the user can freely explore and analyze the data, so that more accurate and scientific decisions can be made.
It can be understood that according to the technical scheme shown in the embodiment, functions such as data acquisition, preprocessing, integration, analysis, index calculation, model establishment, result display and the like can be realized, and efficient, accurate and visual data analysis and decision support are provided.
Preferably, a numerical intelligent operation data analysis index model based on big data and cloud computing can be constructed according to the technical scheme, so that the implementation of the method is more convenient and effective. The layered structure of the numerical intelligent operation data analysis index model is shown in fig. 2. The elements of the numerical intelligent operation data analysis index model, namely the user behavior conversion contact points. The method mainly disassembles around the period and the business target priority, and generalizes some common combination forms from the aspects of purposes, time, excitation modes and the like. The intelligent operation data analysis index model based on big data and cloud computing can make up the limitation of the traditional technology, and provides more powerful, efficient, accurate and intelligent data analysis and decision support: data scale extensibility: based on big data and cloud computing technology, massive structured and unstructured data can be processed, and rapid and real-time data analysis and decision response are supported. Automated handling capability: by means of machine learning and deep learning technology, automatic feature extraction, model training and optimization can be achieved, and workload of manual participation is reduced. Elasticity and flexibility: the cloud computing platform provides elastic resource allocation and capacity expansion and contraction capacity, and can adapt to data analysis tasks with different scales and requirements. Real-time and practical properties: the method can process, analyze and learn data in real time, and provide more accurate and practical decision support. Meanwhile, more hidden modes and associations can be found out without depending on a preset model and rules. Meanwhile, according to the AARRR funnel model, the digital intelligent operation data analysis index model explains 5 indexes for realizing the increase of users, and can help enterprises to better explain the principles of getting customers and maintaining customers.
Compared with the prior art, the technical scheme disclosed by the embodiment can have the following effects and advantages:
data quality and analysis precision are improved: through data preprocessing and cleaning, errors, losses and inconsistencies in data can be automatically cleaned and repaired, and therefore data quality is improved. Meanwhile, by adopting an advanced analysis algorithm and model, hidden modes and relations can be found, and the analysis precision and accuracy are improved.
Realize efficient data processing and analysis: by utilizing the parallel computing and distributed storage capacity of the cloud computing platform, large-scale data can be rapidly processed and real-time analysis can be realized. In the links of data acquisition, data preprocessing, model training, index calculation and the like, the calculation tasks can be processed and optimized in parallel, and the efficiency and performance are improved.
Simplifying operation and improving user experience: the intelligent operation data analysis index model software based on big data and cloud computing designs a user-friendly interactive interface, and the operation flow and the data analysis process of a user are simplified through the visualization tool and the self-service analysis function. The user can easily perform data query, index calculation and analysis display through simple operation, and the user experience and efficiency are improved.
Providing flexible decision support: the intelligent operation data analysis index model software based on big data and cloud computing is independent of preset rules and logic, and can flexibly process diversified analysis requirements and decision scenes. Through machine learning and deep learning technology, models can be learned and optimized, more correlations and modes can be found from data, and more accurate and intelligent decision support is provided.
Resource saving and cost reduction: by utilizing the elasticity and automation characteristics of the cloud computing platform, the resource allocation and expansion and contraction can be carried out according to actual demands, and payment can be carried out according to actual used resources such as calculation, storage and network, so that the input cost of hardware equipment and human resources is reduced.
The following effects can be brought about:
deep insight: by analyzing the big data, the characteristics, the trend and the mode of the operation data can be deeply known, rules and correlations behind the data are revealed, and enterprises are helped to better know markets, users and competing environments.
Accurate prediction: based on the established model, future trends and results can be accurately predicted, and enterprises can be helped to make reasonable planning and decision. Future sales and demand changes may be predicted, for example, by analysis of sales data.
Optimizing and deciding: by analyzing and mining the data of the operation data, the basis for optimizing decisions can be provided for enterprises. By knowing the relevance and regularity in the data, enterprises can formulate more reasonable marketing strategies, product pricing strategies, and the like.
Discovery opportunities and risks: by analyzing the data, potential opportunities and risks can be discovered. For example, through analysis of market data, emerging markets and potential customer groups can be identified, providing guidance for business expansion for enterprises.
Real-time monitoring and adjustment: by utilizing cloud computing and real-time data processing, operation data can be monitored in real time, abnormal conditions can be found in time, and rapid decision adjustment can be performed. This helps businesses maintain agility and competitive advantage in highly competitive markets.
To achieve these effects, the numerical intelligent operation data analysis index model and the product structure need to cooperate with each other, including the following aspects:
data integration and processing: and integrating the scattered and heterogeneous data sources through cloud computing and big data technology, and normalizing and cleaning the data sources so as to ensure the analyzability and usability of the data.
Model development and application: based on the business requirements, a suitable data analysis model is established, including a machine learning model, a statistical analysis model, a prediction model and the like. Meanwhile, the model is applied to the actual data for analysis and prediction.
Results visualization and reporting: through a visualization technology, analysis results are presented to a decision maker in the form of charts, reports and the like, and visual data-driven decision support is provided. This helps the decision maker to quickly understand the analysis results and make the corresponding decisions.
Real-time monitoring and feedback: and realizing real-time monitoring and feedback of operation data by means of a cloud computing and real-time data processing platform. The problems and opportunities are found out in time and decisions and strategies are adjusted in time.
In the actual application scenario, an e-commerce platform is taken as an example for explanation:
data collection and cleaning: the e-commerce platform collects user behavior data including purchase records, browsing behavior, search history, and the like. The data are generated through the interactive behavior of the user on the e-commerce platform, and are cleaned and preprocessed to ensure the accuracy and consistency of the data.
Data storage and processing: and the e-commerce platform stores the collected user behavior data into a big data storage system of the cloud computing platform so as to process the data by utilizing the distributed storage and computing capacity. Through the cloud computing platform, large-scale user behavior data can be efficiently processed.
Model development and application: based on the user behavior data, the e-commerce platform may build a classification model to identify different types of users, such as premium customers, sleeping customers, and new customers. The model may be trained using machine learning algorithms and classified according to characteristics of the user such as frequency of purchase, amount of purchase, liveness, etc.
Result analysis and optimization strategies: by using the established classification model, the e-commerce platform can analyze different types of users and make optimization strategies. For example, for premium customers, personalized offers may be provided to promote their loyalty; for sleeping customers, they can wake up their purchase interests by pushing specific campaigns or coupons; for new clients, pull-up offers can be taken and the retention is improved.
Results visualization and reporting: in order to enable a decision maker to better understand analysis results, the e-commerce platform can display the analysis results of different types of users in the form of charts and reports by using a data visualization technology. These visual results include indicators of the number of different types of users, conversion rates, repurchase rates, etc., as well as optimization strategy suggestions for different types of users.
Example two
There is provided an operation data analysis system based on big data, including:
the data analysis module is used for receiving an analysis target input by a user;
the data acquisition module is used for automatically acquiring and storing the original data related to the analysis target;
the data preprocessing module is used for carrying out data restoration or data cleaning on the original data to obtain secondary data;
the data integration standardization module is used for integrating the secondary data into a preset standardization data model and carrying out standardization processing on the secondary data to obtain standardization data;
the data analysis mining module is used for carrying out data analysis and mining on the standardized data by utilizing a preset big data technology and a machine learning algorithm to obtain data hiding associated information;
the index calculation model building module is used for designing and calculating key indexes by utilizing the standardized data and the data hiding associated information according to the analysis target, and building a mathematical model and algorithm related to the key indexes;
the data analysis module is further used for carrying out data analysis on the standardized data and the data hiding associated information according to the analysis target by utilizing the mathematical model and the algorithm to obtain an analysis result.
It can be appreciated that, according to the technical scheme provided by the embodiment, an analysis target input by a user can be received, original data is automatically collected and stored, data restoration or data cleaning is performed to obtain secondary data, the secondary data is integrated into a preset standardized data model, and standardized processing is performed to obtain standardized data; carrying out data analysis and mining on the standardized data by utilizing a preset big data technology and a machine learning algorithm to obtain data hiding associated information; according to the analysis target, designing and calculating key indexes, and establishing a mathematical model and algorithm related to the key indexes; and (3) carrying out data analysis by using a mathematical model and an algorithm to obtain an analysis result. The technical scheme provided by the invention can automatically clean and repair errors, losses and inconsistencies in data, thereby improving the data quality. Meanwhile, a big data technology and a machine learning algorithm are adopted, so that hidden modes and association can be found, and the analysis precision and accuracy are improved. The method is independent of preset rules and logics, can flexibly process diversified analysis requirements and decision scenes, discovers more associations and modes from data through a machine learning technology, and establishes a related model, so that more accurate and intelligent analysis results are obtained.
It should be noted that the system further includes:
and the cloud storage module is used for storing the collected original data related to the analysis target by using a storage service provided by the cloud computing platform.
It can be understood that by utilizing the elasticity and automation characteristics of the cloud computing platform, the resource allocation and expansion and contraction can be performed according to actual demands, and payment can be performed according to actual used resources such as calculation, storage, network and the like, so that the input cost of hardware equipment and human resources is reduced.
The data analysis module, the data acquisition module, the cloud storage module, the data preprocessing module, the data integration standardization module, the data analysis mining module and the index calculation model building module are connected and data transmitted through a service and communication mechanism provided by a cloud calculation platform.
It is to be understood that the same or similar parts in the above embodiments may be referred to each other, and that in some embodiments, the same or similar parts in other embodiments may be referred to.
It should be noted that in the description of the present invention, the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. Furthermore, in the description of the present invention, unless otherwise indicated, the meaning of "plurality" means at least two.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and further implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.
It is to be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, may be implemented using any one or combination of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.
Those of ordinary skill in the art will appreciate that all or a portion of the steps carried out in the method of the above-described embodiments may be implemented by a program to instruct related hardware, where the program may be stored in a computer readable storage medium, and where the program, when executed, includes one or a combination of the steps of the method embodiments.
In addition, each functional unit in the embodiments of the present invention may be integrated in one processing module, or each unit may exist alone physically, or two or more units may be integrated in one module. The integrated modules may be implemented in hardware or in software functional modules. The integrated modules may also be stored in a computer readable storage medium if implemented in the form of software functional modules and sold or used as a stand-alone product.
The above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, or the like.
In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
While embodiments of the present invention have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the invention, and that variations, modifications, alternatives and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the invention.
Claims (9)
1. An operational data analysis method based on big data, comprising:
receiving an analysis target input by a user, and automatically collecting and storing original data related to the analysis target;
performing data restoration or data cleaning on the original data to obtain secondary data;
integrating the secondary data into a preset standardized data model, and carrying out standardized processing on the secondary data to obtain standardized data;
carrying out data analysis and mining on the standardized data by using a preset big data technology and a machine learning algorithm to obtain data hiding associated information;
according to the analysis target, utilizing the standardized data and the data hiding associated information to design and calculate key indexes, and establishing a mathematical model and algorithm related to the key indexes;
and carrying out data analysis on the standardized data and the data hiding associated information according to the analysis target by using the mathematical model and the algorithm to obtain an analysis result.
2. The method of claim 1, further comprising, after the obtaining the analysis result:
and generating a data visualization interface according to the analysis result by utilizing a data visualization technology.
3. The method of claim 1, wherein the automatically collecting raw data related to the analysis target comprises:
and deploying data acquisition agents on different data sources, and acquiring the original data related to the analysis target by using a data acquisition interface.
4. The method of claim 1, wherein the automatically collecting and storing raw data related to the analysis target comprises:
and storing the collected original data related to the analysis target by using a storage service provided by the cloud computing platform.
5. The method of claim 1, wherein the data analysis and mining of the standardized data using pre-set big data techniques and machine learning algorithms comprises:
extracting features of the standardized data and the data hiding associated information by using the machine learning algorithm, and converting the standardized data and the data hiding associated information into feature vectors;
and carrying out pattern recognition, classification, clustering and association rule mining on the feature vectors to obtain the data hiding association information.
6. The method of claim 1, wherein the mathematical model and algorithm established comprises:
one or more of a linear regression model, a cluster analysis model, a text emotion analysis model, a random forest algorithm, or an association rule mining algorithm.
7. An operational data analysis system based on big data, comprising:
the data analysis module is used for receiving an analysis target input by a user;
the data acquisition module is used for automatically acquiring and storing the original data related to the analysis target;
the data preprocessing module is used for carrying out data restoration or data cleaning on the original data to obtain secondary data;
the data integration standardization module is used for integrating the secondary data into a preset standardization data model and carrying out standardization processing on the secondary data to obtain standardization data;
the data analysis mining module is used for carrying out data analysis and mining on the standardized data by utilizing a preset big data technology and a machine learning algorithm to obtain data hiding associated information;
the index calculation model building module is used for designing and calculating key indexes by utilizing the standardized data and the data hiding associated information according to the analysis target, and building a mathematical model and algorithm related to the key indexes;
the data analysis module is further used for carrying out data analysis on the standardized data and the data hiding associated information according to the analysis target by utilizing the mathematical model and the algorithm to obtain an analysis result.
8. The system of claim 7, further comprising:
and the cloud storage module is used for storing the collected original data related to the analysis target by using a storage service provided by the cloud computing platform.
9. The system of claim 8, wherein the system further comprises a controller configured to control the controller,
the data analysis module, the data acquisition module, the cloud storage module, the data preprocessing module, the data integration standardization module, the data analysis mining module and the index calculation model building module are connected and data transmission is carried out through a service and communication mechanism provided by a cloud calculation platform.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311087003.9A CN117076521A (en) | 2023-08-28 | 2023-08-28 | Operational data analysis method and system based on big data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311087003.9A CN117076521A (en) | 2023-08-28 | 2023-08-28 | Operational data analysis method and system based on big data |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117076521A true CN117076521A (en) | 2023-11-17 |
Family
ID=88715053
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311087003.9A Pending CN117076521A (en) | 2023-08-28 | 2023-08-28 | Operational data analysis method and system based on big data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117076521A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117668084A (en) * | 2023-12-11 | 2024-03-08 | 烽华(黑龙江)数字科技有限公司 | Data application and open circulation operation service platform |
CN118227603A (en) * | 2024-03-22 | 2024-06-21 | 广州标点医药信息股份有限公司 | Warehouse entry processing system and method for medical information data |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109583796A (en) * | 2019-01-08 | 2019-04-05 | 河南省灵山信息科技有限公司 | A kind of data digging system and method for Logistics Park OA operation analysis |
WO2020147349A1 (en) * | 2019-01-14 | 2020-07-23 | 中国电力科学研究院有限公司 | Power distribution network operation aided decision-making analysis system and method |
CN112506907A (en) * | 2020-12-10 | 2021-03-16 | 北谷电子有限公司 | Engineering machinery marketing strategy pushing method, system and device based on big data |
CN116629802A (en) * | 2023-02-17 | 2023-08-22 | 国能朔黄铁路发展有限责任公司 | Big data platform system for railway port station |
-
2023
- 2023-08-28 CN CN202311087003.9A patent/CN117076521A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109583796A (en) * | 2019-01-08 | 2019-04-05 | 河南省灵山信息科技有限公司 | A kind of data digging system and method for Logistics Park OA operation analysis |
WO2020147349A1 (en) * | 2019-01-14 | 2020-07-23 | 中国电力科学研究院有限公司 | Power distribution network operation aided decision-making analysis system and method |
CN112506907A (en) * | 2020-12-10 | 2021-03-16 | 北谷电子有限公司 | Engineering machinery marketing strategy pushing method, system and device based on big data |
CN116629802A (en) * | 2023-02-17 | 2023-08-22 | 国能朔黄铁路发展有限责任公司 | Big data platform system for railway port station |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117668084A (en) * | 2023-12-11 | 2024-03-08 | 烽华(黑龙江)数字科技有限公司 | Data application and open circulation operation service platform |
CN117668084B (en) * | 2023-12-11 | 2024-07-12 | 烽华(黑龙江)数字科技有限公司 | Data application and open circulation operation service platform |
CN118227603A (en) * | 2024-03-22 | 2024-06-21 | 广州标点医药信息股份有限公司 | Warehouse entry processing system and method for medical information data |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN117076521A (en) | Operational data analysis method and system based on big data | |
CN109726090A (en) | Performance influences the identification of defect in computing system | |
CN117151345A (en) | Enterprise management intelligent decision platform based on AI technology | |
CN115423289B (en) | Intelligent plate processing workshop data processing method and terminal | |
CN111738331A (en) | User classification method and device, computer-readable storage medium and electronic device | |
CN111427974A (en) | Data quality evaluation management method and device | |
CN112002403A (en) | Quantitative evaluation method, device and equipment for medical equipment and storage medium | |
CN115063035A (en) | Customer evaluation method, system, equipment and storage medium based on neural network | |
CN114118793A (en) | Local exchange risk early warning method, device and equipment | |
CN117971947A (en) | System and method based on user side multisource data penetration and service fusion | |
CN113377640B (en) | Method, medium, device and computing equipment for explaining model under business scene | |
CN114330720A (en) | Knowledge graph construction method and device for cloud computing and storage medium | |
JP2021170244A (en) | Learning model construction system and method of the same | |
CN113379529A (en) | Collaborative decision engine application framework | |
CN111612302A (en) | Group-level data management method and equipment | |
CN117150389B (en) | Model training method, carrier card activation prediction method and equipment thereof | |
KR20210025276A (en) | System for providing guideline of marketing using artificial intelligence based on big data | |
CN118410922B (en) | Data processing method and system based on product supply chain | |
CN118153881B (en) | Data center-based business expansion whole-flow monitoring method | |
Wang et al. | Dispatching Marketing Monitoring Based on Data Mining Technology | |
CN118819483A (en) | Online method of ERP system | |
CN118735577A (en) | Electronic commerce information statistical system and method based on big data calculation | |
Sen et al. | Application of Smart Computing in Digital Business and e-commerce through Business Intelligence | |
TR2021021471A2 (en) | A SYSTEM AND METHOD DEVELOPED FOR PENSION COMPANIES | |
Alizadeh et al. | An OWA-Powered Dynamic Customer Churn Modeling in the Banking Industry Based on Customer Behavioral Vectors |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |