CN117807425B - Intelligent data analysis method and system - Google Patents

Intelligent data analysis method and system Download PDF

Info

Publication number
CN117807425B
CN117807425B CN202410232109.1A CN202410232109A CN117807425B CN 117807425 B CN117807425 B CN 117807425B CN 202410232109 A CN202410232109 A CN 202410232109A CN 117807425 B CN117807425 B CN 117807425B
Authority
CN
China
Prior art keywords
data
model
analysis
storage
dynamic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410232109.1A
Other languages
Chinese (zh)
Other versions
CN117807425A (en
Inventor
马文斌
李志杰
王潇
曾其锐
王松青
李林佳
张茜
宋东斌
金江川
孙岩
魏淑静
艾盼盼
马高
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chuangliao Zhizao Hebei Industrial Design Co ltd
Original Assignee
Chuangliao Zhizao Hebei Industrial Design Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chuangliao Zhizao Hebei Industrial Design Co ltd filed Critical Chuangliao Zhizao Hebei Industrial Design Co ltd
Priority to CN202410232109.1A priority Critical patent/CN117807425B/en
Publication of CN117807425A publication Critical patent/CN117807425A/en
Application granted granted Critical
Publication of CN117807425B publication Critical patent/CN117807425B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/604Tools and structures for managing or administering access control systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0442Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/092Reinforcement learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioethics (AREA)
  • Automation & Control Theory (AREA)
  • Medical Informatics (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses an intelligent data analysis method and system in the technical field of data mining, and the method comprises the following steps: based on historical data of a database, a cyclic neural network and a long-short-term memory network method are adopted to analyze the historical data flow mode, and a time sequence analysis technology is utilized to predict future data flow trend so as to generate a data fluidity prediction model. In the invention, in the aspect of long-term prediction of data fluidity and complex pattern recognition, the accuracy of prediction is obviously improved by combining a cyclic neural network and a long-term and short-term memory network method. In the data environment with high variability and dynamic change, the prediction result is more accurate, and the complex decision process can be better supported. In the aspect of data storage optimization, the implementation of the invention can carry out high-efficiency dynamic optimization according to the access mode of data and storage resources, thereby improving the utilization efficiency of the storage medium and the performance of overall data processing.

Description

Intelligent data analysis method and system
Technical Field
The invention relates to the technical field of data mining, in particular to an intelligent data analysis method and system.
Background
The data mining technology field is a process of automatically extracting valuable information from large data sets by utilizing machine learning, statistical analysis and artificial intelligence technology. Data mining is itself a key branch of data science that focuses on discovering patterns, anomalies, associations, and structures in data. Such technology is widely used in a variety of industries, such as finance, healthcare, retail and social media, for predicting trends, making decision support systems, and improving the efficiency of business and scientific research work. The data mining technology comprises various methods such as classification, clustering, regression analysis, association rule learning and the like.
The intelligent data analysis method is a process of deep analysis of data using advanced algorithms and computing techniques, with the aim of extracting meaningful patterns and insight from the raw data. The core of this analysis approach is the ability to process and analyze large amounts of complex data, revealing hidden trends, predicting future events or behaviors, and supporting more intelligent decision making. Intelligent data analysis helps organizations and individuals to more efficiently utilize information by making the data more intuitive and easy to understand, thereby increasing operational efficiency, increasing revenue, or improving service.
Although the prior art has the capability of analyzing a large database, the problem of insufficient precision exists in terms of long-term prediction of data fluidity and complex pattern recognition. This results in predictions that are not accurate enough or sufficient to support complex decisions in highly variable and dynamically changing data environments. In terms of data storage optimization, the prior art has the problem of low efficiency in dynamically adapting to data access modes and optimizing storage resources. Affecting the efficient use of the storage medium and the overall data processing performance. Regarding the data processing flow, the prior art has the defects in the aspects of real-time response data dynamic change and optimization processing strategy, and the efficiency reduction and response delay in the data processing process are easy to cause. In terms of task priority scheduling, the prior art is deficient in considering data flow importance and urgency, which limits the efficiency of resource allocation and task execution. For data compression and optimization processing, the prior art has the defect of fully utilizing the data statistics characteristic to perform efficient compression, and is easy to cause the non-ideal data transmission and storage efficiency. In the field of data anomaly detection, the prior art has limitations in identifying complex and nonlinear data anomaly modes, so that key data problems are easily ignored, and the reliability and safety of data are affected.
Based on the above, the present invention designs an intelligent data analysis method to solve the above problems.
Disclosure of Invention
The invention aims to provide an intelligent data analysis method, which solves the problems that the prior art has the capability of analyzing a large database, but has insufficient precision in the aspects of long-term prediction of data fluidity and complex pattern recognition. This results in predictions that are not accurate enough or sufficient to support complex decisions in highly variable and dynamically changing data environments. In terms of data storage optimization, the prior art has the problem of low efficiency in dynamically adapting to data access modes and optimizing storage resources. Affecting the efficient use of the storage medium and the overall data processing performance. Regarding the data processing flow, the prior art has the defects in the aspects of real-time response data dynamic change and optimization processing strategy, and the efficiency reduction and response delay in the data processing process are easy to cause. In terms of task priority scheduling, the prior art is deficient in considering data flow importance and urgency, which limits the efficiency of resource allocation and task execution. For data compression and optimization processing, the prior art has the defect of fully utilizing the data statistics characteristic to perform efficient compression, and is easy to cause the non-ideal data transmission and storage efficiency. In the field of data anomaly detection, the prior art has limitations in identifying complex and nonlinear data anomaly modes, and is easy to cause the problem that key data is ignored, and the reliability and safety of the data are affected.
In order to achieve the above purpose, the present invention provides the following technical solutions: an intelligent data analysis method, comprising the steps of:
s1: based on historical data of a database, a cyclic neural network and a long-short-term memory network method are adopted to analyze a historical data flow mode, and a time sequence analysis technology is utilized to predict future data flow trend so as to generate a data fluidity prediction model;
S2: based on the data fluidity prediction model, a data storage optimization algorithm based on load balancing is adopted to analyze the data access mode and the storage efficiency, the distribution and the configuration of data in a storage medium are adjusted according to analysis results, a storage plan is formulated according to the data access frequency and the type, and a data storage position optimization strategy is generated;
S3: based on the data storage position optimization strategy, adopting an algorithm based on a differential equation to comprehensively analyze a data processing flow and storage efficiency, adjusting data sampling frequency and processing parameters, adapting to various data types and requirements, and adjusting the data processing strategy according to data fluidity prediction to generate a dynamic data processing model;
S4: based on the dynamic data processing model, adopting a scheduling method based on a priority queue and machine learning to perform real-time analysis and adjustment of task allocation and resource allocation, optimizing a task scheduling strategy according to data mobility and processing requirements, adjusting the priority and resource allocation of a data analysis task according to an analysis result, and generating a dynamic priority scheduling model;
s5: based on the dynamic priority scheduling model, adopting an information entropy theory and Huffman coding to perform data characteristic and statistical analysis, coding the data, and compressing the data according to the data characteristic to generate a data compression model;
S6: based on the data compression model, adopting a chaos theory and a nonlinear data analysis method to analyze the behavior and pattern of the data, identifying an irregular pattern and potential anomalies in the data, and carrying out anomaly detection according to the dynamic change characteristics of the data to generate a data anomaly detection model;
S7: and carrying out data analysis flow planning and resource optimization configuration by adopting a multi-level data integration method based on the data fluidity prediction model, the data storage position optimization strategy, the dynamic data processing model, the dynamic priority scheduling model, the data compression model and the data anomaly detection model, and carrying out continuous optimization on data processing efficiency and accuracy to generate a data processing and analysis strategy.
Preferably, the data fluidity prediction model comprises historical data flow pattern recognition, trend change indexes and key data flow node recognition, the data storage location optimization strategy comprises a data access hot spot diagram, a storage medium allocation scheme and a data migration priority list, the dynamic data processing model comprises data response time optimization, a flow allocation algorithm and a multi-source data synchronization mechanism, the dynamic priority scheduling model comprises a task execution queue, a resource allocation map and a real-time monitoring feedback mechanism, the data compression model comprises compression efficiency assessment, an encoding rule base and a data recovery protocol, the data anomaly detection model comprises anomaly pattern identification, change rate monitoring and anomaly response mechanism, and the data processing and analysis strategy comprises an integrated analysis flow chart, performance assessment standards and an automatic adjustment scheme.
Preferably, based on historical data of a database, a cyclic neural network and a long-short-term memory network method are adopted to analyze a historical data flow mode, a time sequence analysis technology is utilized to predict a future data flow trend, and the specific steps of generating a data fluidity prediction model are as follows:
S101: based on historical data of a database, an autoregressive model is adopted, and a time sequence basic feature set is generated by analyzing the linear relation between time points and numerical values in the historical data and extracting time sequence basic features;
s102: based on the time sequence basic feature set, adopting a cyclic neural network algorithm, capturing a periodic mode and a change trend in data flow by constructing hidden layer state circulation in a network, and generating a periodic data flow mode model;
S103: based on the periodic data flow mode model, adopting a long-short-term memory network algorithm, and analyzing a continuous mode and a structure of data flow by using a gating mechanism with a memory unit to generate a continuous data flow analysis model;
s104: based on the continuous data flow analysis model, a time sequence prediction technology is adopted, and the prediction capability of future data flow trend is optimized through adjustment and analysis of the model, so that a data flow prediction model is generated.
Preferably, based on the data fluidity prediction model, a data storage optimization algorithm based on load balancing is adopted to analyze data access modes and storage efficiency, distribution and configuration of data in a storage medium are adjusted according to analysis results, a storage plan is formulated according to data access frequency and type, and the specific steps of generating a data storage position optimization strategy are as follows:
s201: based on the data fluidity prediction model, adopting an association rule learning algorithm, and identifying a hot spot area of data access by analyzing the co-occurrence mode and association degree between data items to generate a data access mode analysis result;
s202: based on the data access mode analysis result, adopting a storage medium performance analysis technology, and selecting a data storage medium by comparing the read-write speeds and response time of various storage media to generate a storage medium performance comparison result;
s203: based on the storage medium performance comparison result, adopting a load balancing algorithm, uniformly distributing data load and optimizing storage efficiency by analyzing a data fluidity prediction result and the storage medium performance, and generating an optimized data distribution scheme;
S204: and based on the optimized data distribution scheme, adopting a data storage planning technology, and planning the storage position and the migration path of the data by analyzing the data access frequency and type to generate a data storage position optimization strategy.
Preferably, based on the data storage location optimization strategy, an algorithm based on a differential equation is adopted to perform comprehensive analysis on a data processing flow and storage efficiency, data sampling frequency and processing parameters are adjusted to adapt to various data types and requirements, and the data processing strategy is adjusted according to data fluidity prediction, so that the specific steps of generating a dynamic data processing model are as follows:
S301: based on the data storage position optimization strategy, a differential equation modeling method is adopted, and by establishing a differential equation between data flow and storage resources, the interaction between the dynamic change of the data flow and the storage capacity is analyzed, the relationship analysis of the data flow mode and the storage efficiency is carried out, and a data flow and storage relationship model is generated;
s302: based on the data flow and storage relation model, adopting a time sequence analysis method, identifying potential periodicity and trend change by carrying out statistical analysis on historical values of data points, adjusting sampling frequency and adapting to multiple data types and requirements, and generating a data sampling parameter optimization scheme;
s303: based on the data sampling parameter optimization scheme, a machine learning algorithm is adopted, data characteristics are analyzed and predicted by applying classification and regression technologies, and a data processing strategy is adjusted according to the data fluidity prediction model to generate an optimized data processing operation flow;
s304: based on the optimized data processing operation flow, a performance evaluation method is adopted, and by calculating key performance indexes including processing time, error rate and resource use efficiency, evaluation of the data processing flow and storage efficiency is carried out, so that a dynamic data processing model is generated.
Preferably, based on the dynamic data processing model, a scheduling method based on priority queues and machine learning is adopted to perform real-time analysis and adjustment of task allocation and resource allocation, a task scheduling strategy is optimized according to data mobility and processing requirements, and the priority and resource allocation of a data analysis task are adjusted according to analysis results, so that the specific steps of generating the dynamic priority scheduling model are as follows:
S401: based on the dynamic data processing model, adopting a heap ordering algorithm, ordering tasks by constructing a binary heap structure to establish a priority queue, and dynamically adjusting according to the urgency and importance of the tasks to generate a task priority queue;
S402: based on the task priority queue, a linear programming algorithm is adopted, a strategy of resource allocation is designed and implemented, a simplex method is used for resource allocation, task demands and resource supply are balanced, and a resource allocation model is generated;
S403: based on the resource allocation model, adopting a decision tree algorithm, analyzing data mobility and processing requirements by constructing a classification rule and a regression model, dynamically adjusting task priority and resource allocation, and generating an adjusted task scheduling strategy;
s404: based on the adjusted task scheduling strategy, a reinforcement learning algorithm is adopted, and a dynamic priority scheduling model is generated by continuously adapting to the change of data liquidity and processing requirements and optimizing task priority and resource allocation through implementing a reward feedback mechanism and state space analysis.
Preferably, based on the dynamic priority scheduling model, adopting information entropy theory and huffman coding to perform data characteristic and statistical analysis, coding the data, and performing data compression according to the data characteristic, wherein the specific steps of generating a data compression model are as follows:
S501: based on the dynamic priority scheduling model, an information entropy theory is adopted, the entropy value of the whole data set is estimated by calculating the occurrence frequency of each data element, the diversity and complexity of the data are analyzed, the potential benefit of compression is estimated, and a data entropy analysis result is generated;
s502: based on the data entropy analysis result, adopting a probability statistical analysis method, and analyzing the occurrence probability and frequency distribution of each data element to reveal the internal statistical characteristics of the data and generate a data statistical characteristic model;
S503: based on the data statistical characteristic model, adopting a Huffman coding algorithm, constructing a coding tree based on symbol frequency, distributing a unique binary code for each symbol, optimizing the total coding length of the whole data set, and generating a data coding scheme;
S504: based on the data coding scheme, a data compression technology is adopted, data is coded by applying Huffman coding, original data symbols are replaced, and a data compression model is generated by combining symbol optimization coding processes with similar frequencies.
Preferably, based on the data compression model, adopting a chaos theory and a nonlinear data analysis method to analyze the behavior and pattern of the data, identifying an irregular pattern and potential anomalies in the data, and implementing anomaly detection according to the dynamic change characteristics of the data, wherein the specific steps of generating a data anomaly detection model are as follows:
S601: based on the data compression model, adopting chaos theory analysis, analyzing nonlinear dynamics behavior of data by calculating Lyapunov indexes and attractor dimensions of a data time sequence, revealing chaos characteristics, and generating a chaos dynamics characteristic analysis result;
S602: based on the chaotic dynamics characteristic analysis result, a nonlinear dynamics analysis method is adopted, phase space tracks and dynamics behaviors of data are analyzed through constructing a phase space and phase space reconstruction technology, an unconventional mode is identified, and a dynamics phase space analysis result is generated;
S603: based on the dynamic phase space analysis result, an abnormal pattern recognition technology is adopted, an abnormal pattern and mutation behaviors are recognized through analysis of outliers and discontinuities of a phase space track, and abnormal behavior characteristics are refined through a cluster analysis and abnormal point detection algorithm, so that an abnormal pattern recognition result is generated;
s604: based on the abnormal pattern recognition result, a dynamic abnormal detection technology is adopted, and a data abnormal detection model is generated by continuously monitoring the change of data and dynamically adjusting detection parameters to adapt to the change trend of the data and the newly-appearing abnormal pattern.
Preferably, based on the data fluidity prediction model, the data storage location optimization strategy, the dynamic data processing model, the dynamic priority scheduling model, the data compression model and the data anomaly detection model, a multi-level data integration method is adopted to conduct data analysis flow planning and resource optimization configuration, and continuous optimization of data processing efficiency and accuracy is conducted, and the specific steps of generating the data processing and analysis strategy are as follows:
s701: based on the data fluidity prediction model, a data integration algorithm is adopted, prediction data of a plurality of models are summarized, fusion is carried out on data sources, the fusion comprises time points, numerical values and frequencies, a comprehensive view containing multidimensional data streams is constructed, and a data flow overview is generated;
S702: based on the data flow overview and the data storage position optimization strategy, adopting a space reconstruction algorithm, and re-planning the distribution of the data on the storage medium by analyzing the access frequency and the storage efficiency of the data, balancing the access speed and the storage space utilization rate, and generating an adjusted data storage layout;
S703: based on the adjusted data storage layout and the dynamic data processing model, adopting a scheduling algorithm, selecting the priority of each data processing task by analyzing the time sensitivity and the resource consumption of data processing, and re-planning the data processing flow according to the data processing requirements and the available resources to generate an adjusted data processing strategy;
S704: based on the adjusted data processing strategy, the dynamic priority scheduling model, the data compression model and the data anomaly detection model, an optimization adjustment technology is adopted, and the execution sequence and the data storage format of the data processing tasks are adjusted by evaluating the emergency degree of the multiprocessing tasks and the possibility of data compression, so that a data processing and analyzing strategy is generated.
An intelligent data analysis system comprises a data fluidity prediction module, a data storage optimization module, a data processing flow module, a task scheduling module, a data compression module, an anomaly detection module, an integrated analysis module and a system optimization module;
the data fluidity prediction module is used for analyzing the periodicity and trend of the historical data and predicting the future mode of data flow by adopting a cyclic neural network and long-term and short-term memory network method and combining an autoregressive model of a time sequence based on the historical data of the database to generate a data fluidity prediction model;
The data storage optimization module analyzes the co-occurrence mode and the association degree among data items by adopting an association rule learning algorithm based on a data fluidity prediction model, and re-plans the layout of data in a storage medium by combining a space reconstruction algorithm to generate a data storage position optimization strategy;
the data processing flow module adopts differential equation modeling based on a data storage position optimization strategy, combines time sequence analysis, adjusts data sampling frequency and processing parameters according to various data types and requirements, and generates a dynamic data processing model;
the task scheduling module builds a task execution queue by adopting a heap ordering algorithm based on a dynamic data processing model, performs resource allocation by using a linear programming method, dynamically adjusts task priority and resource allocation, and generates a dynamic priority scheduling model;
The data compression module analyzes data diversity by adopting an information entropy theory based on a dynamic priority scheduling model, codes data characteristics by combining a Huffman coding algorithm, and performs data compression processing to generate a data compression model;
the anomaly detection module is used for identifying an irregular mode and potential anomalies in data by adopting a chaos theory and a nonlinear data analysis method based on a data compression model, and carrying out dynamic anomaly detection to generate a data anomaly detection model;
The integrated analysis module is used for integrating contents and comprehensively evaluating and analyzing data based on a data fluidity prediction model, a data storage position optimization strategy, a dynamic data processing model, a dynamic priority scheduling model, a data compression model and a data anomaly detection model by adopting a multi-level data integration method to generate a comprehensive data analysis result;
The system optimization module adopts a self-adaptive optimization technology driven by machine learning based on the comprehensive data analysis result, adjusts and optimizes the whole flow and resources according to the real-time feedback of the data processing flow and the resource configuration, and generates a data processing and analysis strategy.
Compared with the prior art, the invention has the beneficial effects that: in the aspect of long-term prediction of data fluidity and complex pattern recognition, the accuracy of prediction is remarkably improved by combining a cyclic neural network and a long-term and short-term memory network method. In the data environment with high variability and dynamic change, the prediction result is more accurate, and the complex decision process can be better supported. In the aspect of data storage optimization, the implementation of the invention can carry out high-efficiency dynamic optimization according to the access mode of data and storage resources, thereby improving the utilization efficiency of the storage medium and the performance of overall data processing. The dynamic data model optimization strategy based on the differential equation enables the data processing flow to respond to dynamic changes of data in real time, and the data processing efficiency and response speed are remarkably improved by optimizing the processing strategy. The task priority and resource allocation can be more effectively adjusted by adopting a scheduling method based on priority queues and machine learning, so that the efficiency and accuracy of task execution are improved. In the aspect of data compression, the information entropy theory and Huffman coding are utilized, so that the efficient compression based on the data statistics characteristic is realized, and the resource requirements of data transmission and storage are reduced. According to the invention, a chaos theory and a nonlinear data analysis method are adopted in the aspect of data anomaly detection, so that the identification capability of complex and nonlinear data anomaly modes is improved, and the reliability and safety of data are enhanced. Taken together, these benefits collectively promote the overall quality and efficiency of data analysis, making the innovative approach excellent in processing large data sets.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a schematic flow chart of step S1 in the intelligent data analysis method of the present invention;
FIG. 3 is a schematic flow chart of step S2 in the intelligent data analysis method of the present invention;
FIG. 4 is a schematic flow chart of step S3 in the intelligent data analysis method of the present invention;
FIG. 5 is a schematic flow chart of step S4 in the intelligent data analysis method of the present invention;
FIG. 6 is a schematic flow chart of step S5 in the intelligent data analysis method of the present invention;
FIG. 7 is a schematic flow chart of step S6 in the intelligent data analysis method of the present invention;
FIG. 8 is a schematic flow chart of step S7 in the intelligent data analysis method of the present invention;
FIG. 9 is a block diagram of an intelligent data analysis system according to the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1, the present invention provides a technical solution: an intelligent data analysis method, comprising the steps of:
s1: based on historical data of a database, a cyclic neural network and a long-short-term memory network method are adopted to analyze a historical data flow mode, and a time sequence analysis technology is utilized to predict future data flow trend so as to generate a data fluidity prediction model;
S2: based on the data fluidity prediction model, a data storage optimization algorithm based on load balancing is adopted to analyze the data access mode and the storage efficiency, the distribution and the configuration of data in a storage medium are adjusted according to analysis results, and a storage plan is formulated according to the data access frequency and the type to generate a data storage position optimization strategy;
S3: based on a data storage position optimization strategy, adopting an algorithm based on a differential equation to comprehensively analyze a data processing flow and storage efficiency, adjusting data sampling frequency and processing parameters, adapting to various data types and requirements, and adjusting the data processing strategy according to data fluidity prediction to generate a dynamic data processing model;
S4: based on a dynamic data processing model, adopting a scheduling method based on a priority queue and machine learning to perform real-time analysis and adjustment of task allocation and resource allocation, optimizing a task scheduling strategy according to data mobility and processing requirements, adjusting the priority and resource allocation of a data analysis task according to an analysis result, and generating a dynamic priority scheduling model;
s5: based on a dynamic priority scheduling model, adopting an information entropy theory and Huffman coding to perform data characteristic and statistical analysis, coding data, and compressing the data according to the data characteristic to generate a data compression model;
S6: based on a data compression model, adopting a chaos theory and a nonlinear data analysis method to analyze the behavior and pattern of the data, identifying an irregular pattern and potential anomalies in the data, and carrying out anomaly detection according to the dynamic change characteristics of the data to generate a data anomaly detection model;
S7: based on a data fluidity prediction model, a data storage position optimization strategy, a dynamic data processing model, a dynamic priority scheduling model, a data compression model and a data anomaly detection model, a multi-level data integration method is adopted to conduct data analysis flow planning and resource optimization configuration, and continuous optimization of data processing efficiency and accuracy is conducted to generate a data processing and analysis strategy.
The data mobility prediction model comprises historical data flow pattern recognition, trend change indexes and key data flow node recognition, the data storage position optimization strategy comprises a data access hot spot diagram, a storage medium allocation scheme and a data migration priority list, the dynamic data processing model comprises data response time optimization, a flow allocation algorithm and a multi-source data synchronization mechanism, the dynamic priority scheduling model comprises a task execution queue, a resource allocation map and a real-time monitoring feedback mechanism, the data compression model comprises compression efficiency assessment, a coding rule base and a data recovery protocol, the data anomaly detection model comprises anomaly pattern identification, change rate monitoring and anomaly response mechanism, and the data processing and analysis strategy comprises an integrated analysis flow chart, a performance assessment standard and an automatic adjustment scheme.
In step S1, the system performs analysis of a historical data flow pattern using a recurrent neural network and a long-short-term memory network method based on the historical data of the database. Here, the recurrent neural network is responsible for identifying and learning time dependencies in the historical data, while the long-term memory network focuses on capturing long-term dependencies in the data. At the same time, time series analysis techniques are applied to predict future data flow trends. In the specific operation, the cyclic neural network carries out repeated learning of historical data through feedback connection, the long-term and short-term memory network processes and memorizes key information through a unique gating mechanism, and the time sequence analysis utilizes the historical data to establish a mathematical model to predict future trend. This series of operations creates a data flow prediction model that can identify historical data flow patterns, trend change indicators, and key data flow nodes, providing accurate data flow trend predictions for subsequent steps.
In step S2, the system performs data access mode and storage efficiency analysis by using a data storage optimization algorithm based on load balancing based on the data fluidity prediction model. The algorithm first analyzes the data access pattern provided by the data fluidity prediction model, and then adjusts the distribution and configuration of the data in the storage medium according to the analysis result. In the data storage optimization process, a storage plan is formulated in consideration of the frequency and the type of data access. When the algorithm operates, a reasonable storage position allocation scheme is formulated by evaluating the access frequency and type of the data so as to optimize the overall storage efficiency. The generated data storage location optimization strategy comprises a data access hot spot diagram, a storage medium allocation scheme and a data migration priority list, and the results significantly improve the data access efficiency and the system response speed.
In step S3, the system applies an algorithm based on differential equations to comprehensively analyze the data processing flow and the storage efficiency based on the data storage location optimization strategy. The algorithm combines the analytic characteristics of the differential equation, and adjusts the data sampling frequency and the processing parameters so as to adapt to different data types and requirements. In this process, the input of the data flow prediction model is used to adjust the data processing strategy to achieve efficient processing of multiple data types. The key to this step is the application of differential equation algorithms that analyze and adjust the data processing flow through an accurate mathematical model to generate a dynamic data processing model. The model comprises data response time optimization, a flow distribution algorithm and a multi-source data synchronization mechanism, so that the flexibility and the efficiency of data processing are effectively improved.
In step S4, the system performs real-time analysis and adjustment of tasks and resources according to the dynamic data processing model by using a scheduling method based on priority queues and machine learning. Through the priority queue, the system can effectively manage and schedule various data analysis tasks, and the machine learning method enables the system to intelligently optimize task scheduling strategies according to data mobility and processing requirements. Specific operations include priority classification of tasks, dynamic allocation of resources, and establishment of a real-time monitoring feedback mechanism. Through the operations, a dynamic priority scheduling model is generated, and the model comprises a task execution queue, a resource allocation map and a real-time monitoring feedback mechanism, so that task scheduling and resource allocation are optimized, and the efficiency of task execution and priority management of data analysis are improved.
In step S5, the system performs data feature and statistical analysis based on the dynamic priority scheduling model by applying information entropy theory and huffman coding. Information entropy theory is used to evaluate the uncertainty and complexity of data, while huffman coding is used to efficiently encode and compress data. In this step, the system first performs deep feature analysis on the data, and then encodes and compresses the data according to the characteristics of the data, so as to reduce the required storage space and improve the transmission efficiency. The generated data compression model comprises compression efficiency evaluation, a coding rule base and a data recovery protocol, and the results not only reduce the data storage requirement, but also accelerate the data processing speed.
In step S6, the system analyzes the data behavior and pattern using chaos theory and nonlinear data analysis methods based on the data compression model. Through chaos theory, the system can identify complex modes and randomness in data, and a nonlinear data analysis method is used for exploring complex dynamics of data behaviors. The key to this step is the identification of unusual patterns and potential anomalies, as well as anomaly detection of the dynamic change characteristics of the data. The generated data anomaly detection model includes anomaly pattern identification, rate of change monitoring, and anomaly response mechanisms, which are critical to maintaining data integrity and system stability.
In step S7, the system comprehensively applies a data fluidity prediction model, a data storage location optimization strategy, a dynamic data processing model, a dynamic priority scheduling model, a data compression model and a data anomaly detection model, and adopts a multi-level data integration method. In the step, the system performs data analysis flow planning and resource optimization configuration, and simultaneously continuously optimizes the data processing efficiency and accuracy. Specific operations include integrating the formulation of an analytical flow chart, the establishment of performance assessment criteria, and the implementation of an automated tuning scheme. The implementation of the operations generates a data processing and analyzing strategy, so that the overall efficiency and accuracy of data processing are improved, and the optimal configuration and utilization of resources are realized.
Referring to fig. 2, based on the historical data of the database, the method of using the cyclic neural network and the long-short-term memory network is adopted to analyze the flow pattern of the historical data, and the time sequence analysis technology is used to predict the future data flow trend, so as to generate the data fluidity prediction model, which comprises the following specific steps:
S101: based on historical data of a database, an autoregressive model is adopted, and a time sequence basic feature set is generated by analyzing the linear relation between time points and numerical values in the historical data and extracting time sequence basic features;
S102: based on a time sequence basic feature set, adopting a cyclic neural network algorithm, capturing a periodic mode and a change trend in data flow by constructing hidden layer state circulation in a network, and generating a periodic data flow mode model;
s103: based on the periodic data flow mode model, adopting a long-short-period memory network algorithm, and analyzing the continuous mode and structure of data flow by using a gating mechanism with a memory unit to generate a continuous data flow analysis model;
S104: based on the continuous data flow analysis model, a time sequence prediction technology is adopted, and the prediction capability of future data flow trend is optimized through adjustment and analysis of the model, so that a data flow prediction model is generated.
In the step S101, the system extracts time-series basic features for the historical data of the database by applying an autoregressive model. The historical data format is sequence data of time labels, and comprises specific numerical values of all time points. The autoregressive model focuses on analyzing the linear relationship between time points and values in these historical data. The specific implementation process comprises the standardized processing of the data so as to eliminate dimension influence and highlight data characteristics; then, extracting local features in the time sequence, such as trends, seasonality and the like, by utilizing a sliding window method; these local features are then analyzed linearly by means of an autoregressive model, i.e. the values at the next time point are predicted using the time period values of the historical data. In this process, the model calculates the degree of association, i.e., the autocorrelation coefficient, of each point in time data with its previous data points. Finally, the generated time series basic feature set contains time dependent characteristics and internal rules of historical data, and lays a foundation for the subsequent recognition of more complex modes.
In the S102 substep, based on the time series basic feature set, the system captures periodic patterns and trends in the data flow using a recurrent neural network algorithm. A Recurrent Neural Network (RNN) is a neural network that processes sequence data, particularly suited for time series data analysis. The operation procedure first involves building an RNN model, including an input layer, a hidden layer, and an output layer. In the hidden layer, the setting of the state loop is critical, which allows the network to keep information of the previous point in time and to influence the output of the current point in time. During the training process, the network learns periodicity and trends in the data by adjusting weights and biases. The training uses a time series base feature set, each data point of which is processed through a network, which outputs a predicted value for the next time point. By repeating this process, the network gradually learns to identify periodic patterns in the data. The finally generated periodic data flow mode model can accurately reflect the periodic characteristics of data flow, and provides a basis for further continuous mode analysis.
In the sub-step S103, a long-short-term memory network (LSTM) algorithm is used to analyze the persistence pattern and structure of the data flow based on the periodic data flow pattern model. The LSTM network is an improvement of the RNN and solves the problem of memory decay of the traditional RNN when processing long sequence data. The core of LSTM is the gating mechanism with memory cells, including forget gate, input gate and output gate. These gates control the storage and forgetting of information, enabling the network to learn long-term dependencies in the data. In operation, network parameters are first initialized and then data provided by the periodic data flow pattern model is entered. At each point in time, the forget gate decides how much of the previous information was discarded, the input gate controls the amount of new information received, and the output gate decides the formation of the output value. In this way, the LSTM network can efficiently maintain and update important information throughout the data stream. The generated continuous data flow analysis model accurately describes the long-term mode and structural characteristics of the data flow, and provides an important basis for the following time sequence prediction.
In a substep S104, the system applies a time series prediction technique to optimize the predictive power of future data flow trends based on the persistence data flow analysis model. The key to the time series prediction technique in this link is to combine the time series pattern previously analyzed with future trend predictions. The specific implementation process comprises the adjustment and analysis of the model, namely, the adjustment of parameters of a time series prediction model, such as learning rate, hidden layer node number and the like, according to the identified data pattern. And then, training the model by using historical data, wherein in the training process, the model continuously adjusts own parameters by comparing the difference between the predicted value and the actual value so as to reduce the prediction error. The generated data fluidity prediction model can integrate the basic characteristics, the periodic mode and the long-term dependency relationship of historical data, and provide accurate prediction of future data fluidity trend. The model greatly enhances the accuracy and reliability of data analysis and decision making in practical application.
Assume an internet service company that runs a large database of computer systems that records various performance index data for the software operating system. These data include system response time, log file size, user login frequency, etc., each data item is assigned a time stamp and a specific value. For example, the data item for the system response time is shown :[{"timestamp": "2024-01-01 08:00:00", "response_time": 200}, {"timestamp": "2024-01-01 08:05:00", "response_time": 195}, ...]. in the sub-step S101, where the time series basis feature extraction is performed on these historical data using an autoregressive model. For example, after the historical data of the system response time is subjected to standardization processing, local features in the time sequence, such as average response time and fluctuation trend in a certain time period, are extracted by utilizing a sliding window method. The autoregressive model will calculate the degree of association of each time point data with its previous data point, generating a time series base feature set, such as "8 to 9 in the morning of the work day, system response time 20 ms faster than average level". In the S102 substep, a cyclic neural network is employed to capture periodic patterns and trends in the data flow based on the time series base feature set. For example, after a recurrent neural network model is built, historical system response time data is input, and the model generates a periodic data flow pattern model by learning the periodic changes of the data (e.g., the difference between weekdays and weekends). In a substep S103, a long-short-term memory network is applied to analyze the persistence pattern and structure of the data flow based on the periodic data flow pattern model. The gating mechanism of the long-short-term memory network enables the long-term data dependency to be effectively captured, for example, long-term change trend of response time after system upgrading is identified. In the sub-step S104, the predictive power for future system performance is optimized using a time-series prediction technique based on the persistence data flow analysis model. By analyzing the identified data patterns, such as system usage patterns over a particular period of time, future system performance and maintenance requirements can be more accurately predicted.
Referring to fig. 3, based on a data fluidity prediction model, a data storage optimization algorithm based on load balancing is adopted to analyze data access modes and storage efficiency, distribution and configuration of data in a storage medium are adjusted according to analysis results, a storage plan is formulated according to data access frequency and type, and specific steps for generating a data storage position optimization strategy are as follows:
S201: based on the data fluidity prediction model, adopting an association rule learning algorithm, and identifying a hot spot area of data access by analyzing the co-occurrence mode and association degree between data items to generate a data access mode analysis result;
S202: based on the analysis result of the data access mode, adopting a storage medium performance analysis technology, and selecting a data storage medium by comparing the read-write speeds and response time of various storage media to generate a storage medium performance comparison result;
S203: based on the storage medium performance comparison result, adopting a load balancing algorithm, uniformly distributing data load and optimizing storage efficiency by analyzing the data fluidity prediction result and the storage medium performance, and generating an optimized data distribution scheme;
S204: based on the optimized data distribution scheme, a data storage planning technology is adopted, and a data storage position optimization strategy is generated by analyzing the data access frequency and type and planning the storage position and migration path of the data.
In the sub-step S201, the system uses a correlation rule learning algorithm to analyze the co-occurrence pattern and the degree of correlation between data items based on the data fluidity prediction model. The data format is multidimensional and includes a time stamp, a data item identifier, and attribute values thereof. The role of the association rule learning algorithm here is to mine frequent patterns and strong association rules between data items. The implementation begins with preprocessing of the data, including cleaning and conversion, to accommodate the algorithm input requirements. The algorithm then calculates the support and confidence of each data item to identify frequently occurring data item combinations and the strength of the association between them. For example, the algorithm finds that some two types of data items are frequently accessed simultaneously within some specific period of time. Through the analysis, the generated analysis result of the data access mode reveals the hot spot area and key association of the data access, and provides guidance for data storage optimization.
In the step S202, the system performs storage medium selection using a storage medium performance analysis technique based on the data access pattern analysis result. This step involves comparing read-write speeds, response times, and cost-effectiveness of different storage media (e.g., SSDs, HDDs, cloud storage, etc.). In operation, the performance parameters of various storage media are collected first, and then the storage media most suitable for the access modes are selected according to the analysis results of the data access modes, such as which data items are frequently accessed hot spots. For example, for frequently accessed data items, SSD with fast read/write speed is preferentially selected. By this method, the generated storage medium performance comparison results can provide a quantitative basis for the decision making of the data storage, and the efficiency and cost effectiveness of the data storage scheme are ensured to be maximized.
In the S203 substep, the system uses a load balancing algorithm to uniformly distribute the data load and optimize the storage efficiency based on the storage medium performance comparison result. The load balancing algorithm is here used to ensure a balanced distribution of data among different storage media, avoiding overload or idling of any single media. The operational procedure includes analyzing the data flow predictions and the storage medium performance to determine an optimal data distribution strategy. For example, the algorithm decides to store high frequency accessed data on a higher performing storage medium, while migrating less frequently accessed data to a lower cost storage medium. By the method, the generated optimized data distribution scheme can remarkably improve the speed and efficiency of data access and reduce the storage cost.
In the step S204, the system adopts a data storage planning technique to plan the storage location and migration path of the data based on the optimized data distribution scheme. The purpose of this step is to formulate a detailed storage plan to maximize data access efficiency and overall performance of the storage system. During operation, the system will analyze the frequency and type of data access and then plan for the specific location of the data in the storage system while taking into account the cost and impact of data migration. For example, for data items that are often accessed together, the system may program them to be stored in the same or adjacent physical locations. The generated data storage location optimization policies include specific data storage locations, migration paths, and schedules, which can significantly improve the speed and efficiency of data access while ensuring the stability and reliability of the storage system.
It is assumed that an internet service company operates a huge database of user behavior, recording the activity data of users in different applications. These data include user login time, activity type, number of page accesses, etc., each record having a timestamp, user ID and its behavioral attribute value. For example, data items are shown :[{"timestamp": "2024-01-01 08:00:00", "user_id": "12345", "activity": "page_view", "count": 5}, ...]. in the substep of S201, these user behavior data are analyzed using an association rule learning algorithm, and co-occurrence patterns and associations between data items are mined. Data preprocessing involves cleaning the extraneous terms and converting the format to accommodate the algorithmic input requirements. For example, algorithmic analysis finds that activities of user login and specific page browsing frequently occur simultaneously, 8 to 9 hours a day in the morning. And the algorithm identifies the strong association rule of user login and page browsing in the time period by calculating the support degree and the confidence degree. In the sub-step S202, storage medium performance analysis is started according to the data access pattern analysis result. By comparing the read-write speeds and response times of the Solid State Disk (SSD), the mechanical hard disk (HDD) and the cloud storage, the system finds that the solid state disk is more suitable for frequently accessed data items (such as user login data in the morning). In the sub-step S203, a load balancing algorithm is applied to perform balanced allocation on the data load. For example, high frequency access user login data is distributed to a solid state disk with fast read and write speed, while some less accessed history is stored in a lower cost mechanical hard disk or cloud storage. This allocation strategy effectively balances storage efficiency and cost. In the sub-step S204, detailed data storage locations and migration path plans are formulated according to the optimized data distribution scheme. For example, the system designs an automation script that periodically migrates hot spot data (e.g., user activity data for the current month) from the mechanical hard disk to the solid state disk and transfers old data to cloud storage. Such a data storage plan improves data access efficiency while ensuring stability and scalability of the storage system.
Referring to fig. 4, based on a data storage location optimization strategy, an algorithm based on differential equations is adopted to perform comprehensive analysis of data processing flow and storage efficiency, adjust data sampling frequency and processing parameters, adapt to various data types and requirements, and adjust the data processing strategy according to data fluidity prediction, wherein the specific steps of generating a dynamic data processing model are as follows:
S301: based on a data storage position optimization strategy, a differential equation modeling method is adopted, and by establishing a differential equation between data flow and storage resources, the interaction between the dynamic change of the data flow and the storage capacity is analyzed, the relationship analysis of the data flow mode and the storage efficiency is carried out, and a data flow and storage relationship model is generated;
S302: based on a data flow and storage relation model, adopting a time sequence analysis method, identifying potential periodicity and trend change by carrying out statistical analysis on historical values of data points, adjusting sampling frequency and adapting to multiple data types and requirements, and generating a data sampling parameter optimization scheme;
s303: based on a data sampling parameter optimization scheme, a machine learning algorithm is adopted, data characteristics are analyzed and predicted by applying classification and regression technologies, and a data processing strategy is adjusted according to a data fluidity prediction model to generate an optimized data processing operation flow;
S304: based on the optimized data processing operation flow, a performance evaluation method is adopted, and by calculating key performance indexes including processing time, error rate and resource use efficiency, evaluation of the data processing flow and storage efficiency is carried out, so that a dynamic data processing model is generated.
In the sub-step S301, the system analyzes the interaction of dynamic changes of the data stream and the storage capacity using differential equation modeling methods based on the data storage location optimization strategy. The data format includes time-series data such as data traffic and storage resource usage at each point in time. The key to the differential equation modeling approach is to construct a mathematical model describing the dynamic relationship between data flow and storage resources. In operation, parameters of the model, such as data flow rate, rate of change of storage capacity, etc., are first determined, and then differential equations are established based on these parameters. Differential equations reflect how the growth or reduction of data flow affects the use of storage resources by mathematical expressions. For example, the equation describes how the utilization of storage resources increases as the data flow increases. By analyzing the differential equations, the system can understand how the dynamic change of the data flow interacts with the storage capacity, and the generated data flow and storage relation model reveals the relation between the data flow mode and the storage efficiency and provides basis for subsequent data processing and storage optimization.
In the substep of S302, the system adjusts the sampling frequency to accommodate a plurality of data types and requirements using a time series analysis method based on the data flow and storage relationship model. The process includes a statistical analysis of the historical values of the data points to identify periodicity and trend changes in the data. For example, time series analysis reveals patterns of dramatic increases in data traffic during certain periods of time (e.g., during business rush hours). Based on these findings, the system adjusts the data sampling frequency to more accurately capture these changes. Specific operations to adjust the sampling frequency include increasing the number of sampling points during peak periods while reducing sampling during periods of lower data flow to optimize resource usage and ensure efficient capture of data. The generated data sampling parameter optimization scheme enables data sampling to be more flexible and accurate, and can be better suitable for different data types and processing requirements.
In the sub-step S303, the system adjusts the data processing strategy using a machine learning algorithm based on the data sampling parameter optimization scheme. The process involves analyzing and predicting data features using classification and regression techniques. For example, machine learning algorithms may predict future behavior of data by analyzing data characteristics (e.g., traffic size, access frequency). Based on the data flow prediction model, the algorithm adjusts data processing strategies such as deciding when to perform deeper analysis of the data or migrate the data to a different storage medium. Such adjustments include determining the priority and resource allocation of data processing to maximize processing efficiency and accuracy. The generated optimized data processing operation flow enables the data processing to be more intelligent and efficient, and can be quickly adapted to the change of the data flow.
In the step S304, the system adopts the efficiency evaluation method to evaluate the data processing flow and the storage efficiency based on the optimized data processing operation flow. This process includes calculating key performance indicators such as processing time, error rate, and resource usage efficiency. For example, performance evaluation reveals that the processing time of a certain data processing step is too long or the error rate is too high, indicating the need for optimization. Through these evaluations, the system may further adjust the data processing policy and resource allocation to improve efficiency and accuracy. The generated dynamic data processing model provides a continuous monitoring and optimizing mechanism for the data processing flow, and ensures that the system maintains optimal performance in the face of continuously changing data streams.
It is assumed that an internet service company operates a huge database of user interaction data, which records the activity data of users on various services thereof. The data format includes time series data such as user access per minute and server storage resource usage. For example, the data items are displayed as: [ { "timestamp": "2024-01-01:00:00", "user_ visits": 1200, "storage_use": 80},... In a substep S301, a differential equation modeling method is applied to analyze the interaction of dynamic changes of the data stream with the storage capacity. Parameters of the model, such as data flow rate and rate of change of storage capacity, are first determined. For example, the system builds a differential equation describing the increase in storage resource usage as the user's access volume increases. By parsing these equations, a data flow and storage relationship model is generated that reveals a close correlation of user access peak periods (e.g., 7 to 9 points evening) and storage resource usage. In the sub-step S302, the sampling frequency is adjusted using a time series analysis method based on the data flow and storage relationship model. Through statistical analysis, it was found that during the evening peak, the user access volume increased dramatically. Thus, the system increases the number of data samples for this period while reducing the samples during the night low-traffic period. This adjustment makes the data sampling more flexible and enables accurate capture of key changes in user behavior. In a substep S303, a machine learning algorithm is applied to adjust the data processing strategy. For example, machine learning models predict the peak in data flow that occurs in the future by analyzing the user's access volume and data characteristics of storage usage. Based on these predictions, the system adjusts the data processing flow, optimizing the timing and resource allocation for data migration and analysis, making data processing more efficient. In the step S304, the data processing flow is evaluated by using the performance evaluation method. Efficacy evaluation showed a 15% reduction in treatment time during the evening peak and a 5% reduction in error rate. These assessment results instruct the system to further adjust the data processing policies and resource configurations to improve efficiency and accuracy.
Referring to fig. 5, based on a dynamic data processing model, a scheduling method based on priority queues and machine learning is adopted to perform real-time analysis and adjustment of task allocation and resource allocation, optimize task scheduling strategies according to data fluidity and processing requirements, and adjust the priority and resource allocation of data analysis tasks according to analysis results, and the specific steps of generating the dynamic priority scheduling model are as follows:
S401: based on a dynamic data processing model, adopting a heap ordering algorithm, ordering tasks by constructing a binary heap structure to establish a priority queue, and dynamically adjusting according to the urgency and importance of the tasks to generate a task priority queue;
s402: based on the task priority queue, a linear programming algorithm is adopted, a strategy of resource allocation is designed and implemented, a simplex method is used for resource allocation, task demands and resource supply are balanced, and a resource allocation model is generated;
s403: based on a resource allocation model, adopting a decision tree algorithm, analyzing data mobility and processing requirements by constructing a classification rule and a regression model, dynamically adjusting task priority and resource allocation, and generating an adjusted task scheduling strategy;
S404: based on the adjusted task scheduling strategy, a reinforcement learning algorithm is adopted, the change of data fluidity and processing requirements is continuously adapted by implementing a reward feedback mechanism and state space analysis, the optimization of task priority and resource allocation is carried out, and a dynamic priority scheduling model is generated.
In a sub-step S401, the process of building a task priority queue by a heap sort algorithm involves sorting and prioritizing data processing tasks. Firstly, the system collects data of all tasks to be processed, including critical attributes such as the urgency, importance and expected processing time of the tasks. These data formats are structured data tables, each row representing a task and each column representing an attribute. The heap sort algorithm begins by building a binary heap structure, which is a special complete binary tree, with each node having a value greater than or equal to the value of its child nodes. In this process, the task data is inserted into the binary heap, preserving the characteristics of the heap. The top of the heap is always the task with the highest priority. With the addition of new tasks or the change of task attributes, the heap structure will adjust accordingly to ensure that the top is always the most urgent or important task at present. This dynamic adjustment process involves comparing nodes and exchanging locations to maintain heap characteristics. The generated task priority queue provides a real-time updated task ordering, ensures that the system always processes the most urgent or important tasks first, and improves the overall task processing efficiency and effect.
In the sub-step S402, the process of resource allocation by the linear programming algorithm aims at balancing the task demands with the resource supplies. In implementation, the resource requirement of each task is first determined based on the task priority queue, including required processing time, memory, computing power, and the like. These data are in the form of task attribute tables, each row representing a task and each column representing a resource demand parameter. Then, an optimization model of the resource allocation is constructed using a linear programming algorithm. The model considers the total quantity limit of all resources and the resource requirement of each task, and solves the optimal resource allocation scheme through a simplex method. The simplex method is an effective mathematical optimization technique for finding the optimal solution under a series of linear inequality constraints. By the method, the system can dynamically allocate a proper amount of resources for each task, ensure the maximum utilization of the resources and avoid the waste or shortage of the resources. The generated resource allocation model provides an explicit resource allocation scheme for each task, which not only optimizes the resource utilization efficiency, but also ensures that the tasks are successfully completed according to the priority.
In the substep S403, the process of dynamically adjusting the task priority and resource allocation by the decision tree algorithm is to adapt to the change of the data fluidity and the processing requirement. This process first requires the collection of various data, including the current state of the task, the resource usage, and relevant indicators of data mobility. These data are typically in the form of multi-dimensional data tables that record real-time system operating conditions. The system then analyzes the data using a decision tree algorithm. Decision trees are a machine learning model that performs decision making by building a tree structure. In this structure, each internal node represents a test of an attribute, each branch represents the result of the test, and each leaf node represents a decision result. And traversing along the decision tree by the system according to the current data condition, and finally obtaining a decision whether the task priority and the resource allocation need to be adjusted. The dynamic adjustment process enables the task scheduling strategy to flexibly respond to real-time data change and processing requirements, and ensures high efficiency and adaptability of the processing flow.
In the sub-step S404, the process of optimizing task priorities and resource allocation by reinforcement learning algorithm is a continuous self-learning and adjustment process. In the initial stage, the system establishes a baseline model according to the existing data fluidity and processing requirements, and the model guides the preliminary task priority setting and resource allocation. In the reinforcement learning process, the system evaluates the effect of the current strategy by monitoring the task execution result and the resource use condition in real time. Whenever a task is completed or a resource condition changes, the system receives a feedback signal that acts as a reward or penalty for adjusting the model parameters. The system continuously adjusts the decision strategy of the system through the rewarding mechanism to adapt to the change of data mobility and processing requirements. In this process, reinforcement learning algorithms analyze the various strategies and their long-term effects to find the optimal decision path in the state space. In this way, the system can continuously optimize task priority and resource allocation policies, and the generated dynamic priority scheduling model can ensure that tasks and resource requirements can be responded in an optimal manner at any time.
Assuming an internet service company operates a huge database of computer systems, detailed user behavior data is recorded, including login time, page access times and clicking behaviors of each user. For example, the data items are displayed as: user 1 logs in at 2024-01-01 08:00:00, the number of page accesses is 5, and the number of clicks is 3; user 2 logs in at 2024-01-01:08:05:00, the number of page accesses is 8, and the number of clicks is 5; user 3 logs in at "2024-01-01 08:10:00", the number of page accesses is 7, and the number of clicks is 4. In step S401, a task priority queue is established by using a heap sorting algorithm, and the processing tasks of the user behavior data are sorted and prioritized, so as to ensure that the user behavior data with the largest or most critical data amount is processed first. Next, in step S402, the resources are optimally allocated by a linear programming algorithm, ensuring efficient execution of the data processing tasks. In step S403, the task priority and the resource allocation are dynamically adjusted by using a decision tree algorithm, so as to adapt to the real-time change of the user behavior data. Finally, in step S404, a reinforcement learning algorithm is applied to continuously optimize task priorities and resource configurations to cope with the continuous change of user behavior data. Through the series of steps, machine learning algorithm analysis is carried out on the time series data, and the user behavior mode is identified, for example, the user login frequency is increased between 8 and 9 points in the morning of working days, the average page access frequency is 7, and the clicking frequency is 4. Data mining techniques further optimize and refine these behavior patterns, revealing that users are more inclined to access certain types of pages, such as news or calendars, during this period, the click-through rate of a particular page increases by 50%.
Referring to fig. 6, based on a dynamic priority scheduling model, the specific steps of performing data feature and statistical analysis by adopting an information entropy theory and huffman coding, coding data, and performing data compression according to the data feature to generate a data compression model are as follows:
S501: based on a dynamic priority scheduling model, an information entropy theory is adopted, the entropy value of the whole data set is estimated by calculating the occurrence frequency of each data element, the diversity and complexity of the data are analyzed, the potential benefit of compression is estimated, and a data entropy analysis result is generated;
S502: based on the analysis result of the data entropy value, adopting a probability statistical analysis method, and analyzing the occurrence probability and frequency distribution of each data element to reveal the internal statistical characteristics of the data, so as to generate a data statistical characteristic model;
S503: based on a data statistical characteristic model, adopting a Huffman coding algorithm, constructing a coding tree based on symbol frequency, distributing a unique binary code for each symbol, optimizing the total coding length of the whole data set, and generating a data coding scheme;
S504: based on the data coding scheme, a data compression technology is adopted, data is coded by applying Huffman coding to replace original data symbols, and a data compression model is generated by combining symbol optimization coding processes with similar frequencies.
In the sub-step S501, the data is analyzed using information entropy theory based on the dynamic priority scheduling model. The data set mainly comprises various computer system performance indexes such as system response time, log file size and user login frequency, and each data has corresponding time stamp and value. The frequency of occurrence of each data element in the data set is first calculated to estimate the entropy of information for the entire data set. Information entropy is a measure of the randomness and uncertainty of data, high entropy values mean that data is highly diverse and complex. By analyzing the information entropy of the data set, the potential benefits of compressing the data, namely the redundancy of the data and the compression space, can be evaluated. For example, if a performance index does not change much in value at different points in time, its entropy will be low, indicating that the data has a high compression potential. The information entropy analysis result not only provides a theoretical basis for data compression, but also helps the system understand the inherent characteristics of the data.
In the sub-step S502, the data is further analyzed using a probability statistical analysis method based on the data entropy analysis result. In this step, the system analyzes the probability of occurrence and the frequency distribution of each data element to reveal the inherent statistical properties of the data. The process involves in-depth statistical analysis of the data set, including calculation of the mean, variance, skewness, kurtosis, etc. of various performance indicators. These statistical properties help the system more fully understand the distribution characteristics and waving rules of the data. For example, by analyzing the frequency distribution of the response time of the system, fluctuations in response time over certain specific time periods are found to be more pronounced. The generated data statistical characteristic model provides important references for data coding and compression, and ensures that important characteristics are reserved in the data compression process.
In a substep S503, the data is encoded using a huffman coding algorithm. Huffman coding is a data compression technique based on symbol frequencies, assigning binary codes of different lengths to data elements of different frequencies. Firstly, the system constructs a coding tree according to a data statistical characteristic model, each node represents a data element, and the weight of the node is the occurrence frequency of the element. Then, by assigning a unique binary code to each data element through the path from the root of the tree to each leaf node, the higher frequency elements get shorter codes. In this process, the total code length of the entire data set is optimized, effectively reducing the space required for data storage. The resulting data encoding scheme not only reduces the storage space of the data set, but also maintains the integrity and decodability of the data.
In S504 substep, data is compressed using a data compression technique based on the data encoding scheme. By applying huffman coding, the symbols of the original data are replaced with corresponding binary codes, while optimizing the coding process, combining the frequency-similar symbols to further reduce the coding length. For example, if a certain value of a certain performance index is very common, its encoding will be shortened, thereby reducing the overall data size. After encoding is completed, the original data set is converted into a set of efficient binary sequences, greatly reducing storage requirements. The generated data compression model not only improves the storage efficiency, but also quickens the data transmission speed, and provides convenience for subsequent data analysis and processing.
It is assumed that an internet service company operates a vast database of computer systems that records extensive computer system performance index data, including system response time, log file size, and user login frequency. For example, the data items are presented as: in "2024-01-01 08:00:00", the system response time is 200 milliseconds, the log file size is 150MB, and the user login times are 50 times; at "2024-01-01 08:05:00", the system response time is 195 milliseconds, the log file size is 153MB, and the number of user logins is 55. In the sub-step S501, these performance index data are analyzed using the information entropy theory based on the dynamic priority scheduling model. And evaluating the information entropy value of the data set by calculating the occurrence frequency of each index. The system response time is found to be consistent at most time points, and the information entropy value is relatively low, which indicates that the system has high compression potential. In the substep S502, the dataset is analyzed in depth using a probabilistic statistical analysis method. Statistical analysis shows that although the system response time remains stable for a significant portion of the time, significant fluctuations occur during user login peaks, such as 8 to 9 a.m. per day. In the sub-step S503, the data is encoded by using a huffman encoding algorithm. Because of the stability of the system response time, it is given a shorter binary code, while the number of user logins is given a longer code because of a larger variation. In this way, the total encoding length of the data set is effectively reduced. In a substep S504, the system applies data compression techniques to compress the encoded data. This process significantly reduces data storage space and improves data transmission and processing efficiency. For example, the original log file size is reduced from an average of 150MB to 75MB while maintaining data integrity and decodability.
Referring to fig. 7, based on a data compression model, a chaos theory and a nonlinear data analysis method are adopted to analyze the behavior and pattern of data, identify an irregular pattern and a potential abnormality in the data, and implement abnormality detection according to the dynamic change characteristics of the data, and the specific steps of generating a data abnormality detection model are as follows:
s601: based on a data compression model, adopting chaos theory analysis, analyzing nonlinear dynamics behavior of data by calculating Lyapunov indexes and attractor dimensions of a data time sequence, revealing chaos characteristics, and generating a chaos dynamics characteristic analysis result;
S602: based on the chaotic dynamics characteristic analysis result, a nonlinear dynamics analysis method is adopted, phase space tracks and dynamics behaviors of data are analyzed through constructing a phase space and phase space reconstruction technology, an unconventional mode is identified, and a dynamics phase space analysis result is generated;
s603: based on a dynamic phase space analysis result, an abnormal pattern recognition technology is adopted, an abnormal pattern and a mutation behavior are recognized through analysis of outliers and discontinuities of a phase space track, and abnormal behavior characteristics are refined through a cluster analysis and an abnormal point detection algorithm, so that an abnormal pattern recognition result is generated;
S604: based on the abnormal pattern recognition result, a dynamic abnormal detection technology is adopted, and a data abnormal detection model is generated by continuously monitoring the change of data and dynamically adjusting detection parameters to adapt to the change trend of the data and the newly-appearing abnormal pattern.
In the S601 substep, based on a data compression model, nonlinear dynamics behavior analysis is performed on performance index data of a computer system by adopting a chaos theory analysis method. The performance index data comprises system response time, log file size, user login frequency and the like, and each piece of data is attached with a time stamp and a specific numerical value. The system first calculates the Lyapunov index of these time series data, which is a key parameter for measuring the dynamic stability of the system. The calculation of the Lyapunov index involves a sensitivity analysis of time series data to determine how small changes in the data spread over time. Next, the system calculates the attractor dimension of the data, which is another important parameter that measures the complexity of the system. Calculation of the dimensions of the attractors reveals the distribution of the data in a multidimensional space and helps to identify chaotic characteristics in the data. Through chaos theory analysis, a chaos dynamics characteristic analysis result is generated, and intrinsic dynamics behaviors of computer system performance data, such as nonlinear variation and sensitivity dependence on initial conditions, are revealed.
In S602 substep, based on the result of the analysis of the chaotic dynamics characteristic, a nonlinear dynamics analysis method is used to identify an irregular pattern in the data. This process involves constructing the phase space of the data and performing phase space reconstruction techniques. In constructing the phase space, time series data is converted into trajectories in a multidimensional space to reveal the dynamic behavior of the data. By means of the phase space reconstruction technique, the system can observe the data tracks from different angles and identify those patterns that do not conform to conventional dynamics. For example, it has been found that under certain specific conditions, the change in system response time suddenly becomes very fast or very slow, and is therefore a predictor of system performance problems. The generated dynamic phase space analysis result helps the system identify potential system performance anomalies and provides basis for further diagnosis and optimization.
In the step S603, the system analyzes the phase space trajectory by using an abnormal pattern recognition technique, and recognizes an abnormal pattern and a mutation behavior in the data. This process involves analyzing outliers and discontinuities of the phase-space trajectory to identify those data points that differ significantly from the normal behavior pattern. Cluster analysis and outlier detection algorithms are applied to refine the outlier behavior feature. For example, through cluster analysis, the system identifies patterns in which the system response time increases abnormally over a particular period of time. The abnormal pattern recognition result not only helps the system find potential system performance problems, but also provides important basis for further fault prevention and maintenance strategy formulation.
In the sub-step S604, performance data of the computer system is continuously monitored using a dynamic anomaly detection technique. This process involves real-time analysis of the abnormal pattern recognition results and dynamic adjustment of the detection parameters. The system sets a series of monitoring indicators, such as thresholds for system response time and abnormal patterns for user login frequency. By continuously monitoring the indexes, the system can timely find out abnormal changes of the data and adjust detection parameters according to dynamic change characteristics of the data. For example, if the system response time is detected to exceed the set threshold for several consecutive days, the cause is immediately investigated and measures are taken. The generated data anomaly detection model can flexibly cope with data change and emerging anomaly modes, and the reliability and stability of a computer system are effectively improved.
It is assumed that an internet service company operates a huge computer system database, and key performance index data such as system response time, CPU utilization rate, memory occupancy rate and the like are recorded. For example, the data items are displayed as: when 2024-01-01:08:00:00 ", the system response time is 200 milliseconds, the CPU utilization rate is 50%, and the memory occupancy rate is 60%. In step S601, the data are analyzed by adopting the chaos theory, the Lyapunov index and the attractor dimension are calculated, and nonlinear dynamics behavior and chaos characteristics of the system performance data are revealed. Then, in step S602, a phase space and a phase space reconstruction technique are constructed by a nonlinear dynamics analysis method, and the dynamics behavior of the data is analyzed to identify an irregular pattern. In step S603, outliers and discontinuities of the phase space trajectory are analyzed using an abnormal pattern recognition technique, refining the abnormal behavior features. Finally, in step S604, a dynamic anomaly detection technique is applied to adapt to the dynamic variation trend of the data and the emerging anomaly pattern by continuously monitoring the data variation and adjusting the detection parameters. Through this series of steps, only the key features and potential abnormal patterns of system performance are identified, and a comprehensive system performance pattern library is also generated, which contains the results of multidimensional analysis, such as system performance statistics for each time period and the influence of specific application programs on system performance. The analysis results can effectively manage and optimize the computer system, and the overall performance and stability of the system are improved.
Referring to fig. 8, based on a data fluidity prediction model, a data storage location optimization strategy, a dynamic data processing model, a dynamic priority scheduling model, a data compression model, and a data anomaly detection model, a multi-level data integration method is adopted to perform data analysis flow planning and resource optimization configuration, and continuous optimization of data processing efficiency and accuracy is performed, and the specific steps of generating a data processing and analysis strategy are as follows:
S701: based on the data fluidity prediction model, a data integration algorithm is adopted, prediction data of a plurality of models are summarized, fusion is carried out on data sources, the fusion comprises time points, numerical values and frequencies, a comprehensive view containing multidimensional data streams is constructed, and a data flow overview is generated;
S702: based on the data flow overview and the data storage position optimization strategy, adopting a space reconstruction algorithm, and re-planning the distribution of the data on the storage medium by analyzing the access frequency and the storage efficiency of the data, balancing the access speed and the storage space utilization rate, and generating an adjusted data storage layout;
S703: based on the adjusted data storage layout and the dynamic data processing model, adopting a scheduling algorithm, selecting the priority of each data processing task by analyzing the time sensitivity and the resource consumption of the data processing, and re-planning the data processing flow according to the data processing requirements and the available resources to generate an adjusted data processing strategy;
S704: based on the adjusted data processing strategy, dynamic priority scheduling model, data compression model and data anomaly detection model, adopting an optimization adjustment technology, and adjusting the execution sequence and data storage format of the data processing tasks by evaluating the urgency degree and the data compression possibility of the multiprocessing tasks to generate a data processing and analysis strategy.
In the step S701, based on the data fluidity prediction model, a data integration algorithm is applied to aggregate and fuse the prediction data of the plurality of models. The data includes information such as time points, values, frequencies, etc., and is formatted as time series data. The core task of the data integration algorithm here is to unify the data from different sources into one comprehensive view for comprehensive analysis. Specific operations include normalizing the data format, aligning the time stamps, and integrating the output of the various models to construct a composite view containing the multi-dimensional data stream. The generated data flow overview not only provides a global data summary, but also reveals interrelationships and dependencies between data flows, providing a basis for subsequent analysis and decision making.
In S702 substep, a spatial reconstruction algorithm is applied to optimize the storage layout of the data based on the data flow overview and the data storage location optimization strategy. In this process, the algorithm analyzes the data access frequency and storage efficiency to determine the optimal data distribution scheme. Operations include evaluating performance characteristics of different storage media, such as read-write speed and capacity, and access patterns of data, and then rescheduling the distribution of data on the storage media based on such information. The purpose is to balance access speed and storage space utilization, thereby improving overall storage efficiency. The generated adjusted data storage layout not only optimizes the data access performance, but also improves the efficiency and the expandability of the storage system.
In S703 substep, a scheduling algorithm is employed to optimize the data processing flow based on the adjusted data storage layout and the dynamic data processing model. This includes analyzing the time sensitivity and resource consumption of the data processing and assigning priorities to each data processing task accordingly. In operation, the algorithm considers the urgency and importance of the data processing requirements, as well as available resources such as computing power and memory, and then re-programs the data processing flow based on these factors. The generated adjusted data processing strategy aims at ensuring that the high-priority task obtains enough resources, and simultaneously optimizing the allocation and use of the whole resources so as to improve the efficiency and accuracy of data processing.
In the step S704, an optimization and adjustment technique is adopted to further improve the efficiency and accuracy of data processing by combining the adjusted data processing policy, the dynamic priority scheduling model, the data compression model and the data anomaly detection model. This includes assessing the urgency of multiple processing tasks, exploring the likelihood of data compression, and adjusting the order of execution and data storage formats of the data processing tasks accordingly. For example, for a task with a high degree of urgency but a large amount of data, a policy of compression-before-processing is adopted to reduce processing time. By the method, a data processing and analyzing strategy is generated, so that the efficiency and accuracy of data processing are improved, and the adaptability and flexibility of the system to new conditions are enhanced.
Assuming an internet service company operates a huge computer system database, detailed performance index data such as system response time, CPU utilization, memory occupancy, etc. are recorded. In step S701, the time series data are summarized and fused using a data integration algorithm to create a comprehensive performance data view. For example, the data items include: when 2024-01-01:08:00:00 ", the system response time is 200 milliseconds, the CPU utilization rate is 50%, and the memory occupancy rate is 60%. In step S702, the data storage layout in the database is optimized using a spatial reconstruction algorithm, which helps to improve the efficiency of data access and the performance utilization of the storage medium. For example, at "2024-01-01 08:05:00", the system response time is 210 milliseconds, the CPU utilization is 55%, and the memory occupancy is 65%. In step S703, the data processing flow is optimized using a scheduling algorithm, which includes prioritizing the data processing tasks during early peak hours, for example, "2024-01-08:10:00", with a system response time of 190 ms, a CPU utilization of 45%, and a memory occupancy of 55%. In step S704, a dynamic priority scheduling model and a data anomaly detection model are combined, and further optimization adjustment is performed on the data processing and analysis strategy. The data are subjected to feature analysis by using a machine learning algorithm, and the system performance index is found to have a slight rising trend in the peak hours of the workday morning. Through further analysis of the data mining technique, it is identified that certain applications cause a dramatic increase in CPU and memory usage during early peaks, such as the high frequency usage of certain report generating tools.
Referring to fig. 9, an intelligent data analysis system includes a data fluidity prediction module, a data storage optimization module, a data processing flow module, a task scheduling module, a data compression module, an anomaly detection module, an integrated analysis module, and a system optimization module;
the data fluidity prediction module is used for analyzing the periodicity and trend of the historical data and predicting the future mode of data flow by adopting a cyclic neural network and long-short-term memory network method and combining an autoregressive model of a time sequence based on the historical data of the database to generate a data fluidity prediction model;
The data storage optimization module analyzes the co-occurrence mode and the association degree among data items by adopting an association rule learning algorithm based on the data fluidity prediction model, and re-plans the layout of the data in the storage medium by combining a space reconstruction algorithm to generate a data storage position optimization strategy;
The data processing flow module adopts differential equation modeling based on a data storage position optimization strategy, combines time sequence analysis, adjusts data sampling frequency and processing parameters according to various data types and requirements, and generates a dynamic data processing model;
the task scheduling module builds a task execution queue by adopting a heap ordering algorithm based on the dynamic data processing model, performs resource allocation by using a linear programming method, dynamically adjusts task priority and resource allocation, and generates a dynamic priority scheduling model;
the data compression module analyzes data diversity by adopting an information entropy theory based on a dynamic priority scheduling model, and codes data characteristics by combining a Huffman coding algorithm, so as to perform data compression processing and generate a data compression model;
The anomaly detection module is used for identifying an irregular mode and potential anomalies in data by adopting a chaos theory and a nonlinear data analysis method based on the data compression model, and carrying out dynamic anomaly detection to generate a data anomaly detection model;
The integrated analysis module is used for integrating contents and comprehensively evaluating and analyzing data based on a data fluidity prediction model, a data storage position optimization strategy, a dynamic data processing model, a dynamic priority scheduling model, a data compression model and a data anomaly detection model by adopting a multi-level data integration method to generate a comprehensive data analysis result;
The system optimization module adopts a self-adaptive optimization technology driven by machine learning based on the comprehensive data analysis result, adjusts and optimizes the whole flow and resources according to the real-time feedback of the data processing flow and the resource configuration, and generates a data processing and analysis strategy.
The data fluidity prediction module can effectively analyze the periodicity and trend of historical data by combining a cyclic neural network and a long-short-term memory network method with an autoregressive model of a time sequence, so that the accuracy of predicting a future data flow mode is improved. This combination of advanced algorithms provides more accurate predictions for dynamic and changing data environments, particularly in dealing with complex data patterns and long-term trend predictions.
The data storage optimization module can optimize the layout of the data in the storage medium through an association rule learning algorithm and a space reconstruction algorithm. This not only increases the speed and efficiency of data access, but also reduces storage costs, particularly when processing large-scale data sets, effectively reducing data latency and increasing processing speed.
The data processing flow module adopts differential equation modeling and time sequence analysis to adjust data sampling frequency and processing parameters for various data types and requirements. The method improves the flexibility and adaptability of the data processing flow, so that the system can respond to the data change more quickly and accurately, and the overall processing efficiency is improved.
The task scheduling module can effectively perform task allocation and resource allocation by using a heap ordering algorithm and a linear programming method. The dynamic priority scheduling model ensures that tasks are reasonably scheduled according to priority and resource availability, and improves the overall efficiency and response speed of the system.
The data compression module combines the information entropy theory and the Huffman coding algorithm to optimize the data compression processing process. This not only reduces the cost of data storage and transmission, but also ensures data integrity and accessibility, especially for the management and transmission of large volumes of data.
The anomaly detection module adopts a chaos theory and a nonlinear data analysis method, so that an irregular mode and potential anomalies in data can be effectively identified. The advanced detection mechanism improves the response speed and accuracy to abnormal conditions, and is important to guaranteeing the reliability and safety of data.
And the application of the integrated analysis module and the system optimization module enables the whole data analysis flow to be more efficient and coordinated. Through the self-adaptive optimization technology of multi-level data integration and machine learning driving, the system can carry out self-adjustment and optimization according to real-time feedback, and the data processing efficiency and accuracy are continuously improved.
In the description of the present specification, the descriptions of the terms "one embodiment," "example," "specific example," and the like, mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
The preferred embodiments of the invention disclosed above are intended only to assist in the explanation of the invention. The preferred embodiments are not exhaustive or to limit the invention to the precise form disclosed. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, to thereby enable others skilled in the art to best understand and utilize the invention. The invention is limited only by the claims and the full scope and equivalents thereof.

Claims (7)

1. An intelligent data analysis method is characterized by comprising the following steps:
Based on historical data of a database, a cyclic neural network and a long-short-term memory network method are adopted to analyze a historical data flow mode, a time sequence analysis technology is utilized to predict future data flow trend, and the specific steps of generating a data fluidity prediction model are as follows:
Based on historical data of a database, an autoregressive model is adopted, and a time sequence basic feature set is generated by analyzing the linear relation between time points and numerical values in the historical data and extracting time sequence basic features;
Based on the time sequence basic feature set, adopting a cyclic neural network algorithm, capturing a periodic mode and a change trend in data flow by constructing hidden layer state circulation in a network, and generating a periodic data flow mode model;
Based on the periodic data flow mode model, adopting a long-short-term memory network algorithm, and analyzing a continuous mode and a structure of data flow by using a gating mechanism with a memory unit to generate a continuous data flow analysis model;
Based on the continuous data flow analysis model, adopting a time sequence prediction technology, and optimizing the prediction capability of future data flow trend by adjusting and analyzing the model to generate a data flow prediction model;
based on the data fluidity prediction model, a data storage optimization algorithm based on load balancing is adopted to analyze the data access mode and the storage efficiency, the distribution and the configuration of data in a storage medium are adjusted according to analysis results, a storage plan is formulated according to the data access frequency and the type, and a data storage position optimization strategy is generated;
Based on the data storage position optimization strategy, adopting an algorithm based on a differential equation to comprehensively analyze a data processing flow and storage efficiency, adjusting data sampling frequency and processing parameters, adapting to various data types and requirements, and adjusting the data processing strategy according to data fluidity prediction, wherein the specific steps of generating a dynamic data processing model are as follows:
Based on the data storage position optimization strategy, a differential equation modeling method is adopted, and by establishing a differential equation between data flow and storage resources, the interaction between the dynamic change of the data flow and the storage capacity is analyzed, the relationship analysis of the data flow mode and the storage efficiency is carried out, and a data flow and storage relationship model is generated;
based on the data flow and storage relation model, adopting a time sequence analysis method, identifying potential periodicity and trend change by carrying out statistical analysis on historical values of data points, adjusting sampling frequency and adapting to multiple data types and requirements, and generating a data sampling parameter optimization scheme;
Based on the data sampling parameter optimization scheme, a machine learning algorithm is adopted, data characteristics are analyzed and predicted by applying classification and regression technologies, and a data processing strategy is adjusted according to the data fluidity prediction model to generate an optimized data processing operation flow;
Based on the optimized data processing operation flow, a performance evaluation method is adopted, and by calculating key performance indexes including processing time, error rate and resource utilization efficiency, evaluation of the data processing flow and storage efficiency is carried out, so that a dynamic data processing model is generated;
based on the dynamic data processing model, adopting a scheduling method based on a priority queue and machine learning to perform real-time analysis and adjustment of task allocation and resource allocation, optimizing a task scheduling strategy according to data mobility and processing requirements, adjusting the priority and resource allocation of a data analysis task according to an analysis result, and generating a dynamic priority scheduling model;
Based on the dynamic priority scheduling model, adopting an information entropy theory and Huffman coding to perform data characteristic and statistical analysis, coding the data, and compressing the data according to the data characteristic to generate a data compression model;
Based on the data compression model, adopting a chaos theory and a nonlinear data analysis method to analyze the behaviors and modes of the data, identifying an irregular mode and potential anomalies in the data, and carrying out anomaly detection according to the dynamic change characteristics of the data, wherein the specific steps of generating a data anomaly detection model are as follows:
based on the data compression model, adopting chaos theory analysis, analyzing nonlinear dynamics behavior of data by calculating Lyapunov indexes and attractor dimensions of a data time sequence, revealing chaos characteristics, and generating a chaos dynamics characteristic analysis result;
Based on the chaotic dynamics characteristic analysis result, a nonlinear dynamics analysis method is adopted, phase space tracks and dynamics behaviors of data are analyzed through constructing a phase space and phase space reconstruction technology, an unconventional mode is identified, and a dynamics phase space analysis result is generated;
based on the dynamic phase space analysis result, an abnormal pattern recognition technology is adopted, an abnormal pattern and mutation behaviors are recognized through analysis of outliers and discontinuities of a phase space track, and abnormal behavior characteristics are refined through a cluster analysis and abnormal point detection algorithm, so that an abnormal pattern recognition result is generated;
Based on the abnormal pattern recognition result, a dynamic abnormal detection technology is adopted, and a data abnormal detection model is generated by continuously monitoring the change of data and dynamically adjusting detection parameters to adapt to the change trend of the data and the newly-appearing abnormal pattern;
And carrying out data analysis flow planning and resource optimization configuration by adopting a multi-level data integration method based on the data fluidity prediction model, the data storage position optimization strategy, the dynamic data processing model, the dynamic priority scheduling model, the data compression model and the data anomaly detection model, and carrying out continuous optimization on data processing efficiency and accuracy to generate a data processing and analysis strategy.
2. The intelligent data analysis method according to claim 1, wherein: the data mobility prediction model comprises historical data flow pattern recognition, trend change indexes and key data flow node recognition, the data storage position optimization strategy comprises a data access hot spot diagram, a storage medium allocation scheme and a data migration priority list, the dynamic data processing model comprises data response time optimization, a flow allocation algorithm and a multi-source data synchronization mechanism, the dynamic priority scheduling model comprises a task execution queue, a resource allocation map and a real-time monitoring feedback mechanism, the data compression model comprises compression efficiency assessment, a coding rule base and a data recovery protocol, the data anomaly detection model comprises anomaly pattern identification, change rate monitoring and anomaly response mechanism, and the data processing and analysis strategy comprises an integrated analysis flow chart, a performance assessment standard and an automatic adjustment scheme.
3. The intelligent data analysis method according to claim 1, wherein: based on the data fluidity prediction model, a data storage optimization algorithm based on load balancing is adopted to analyze the data access mode and the storage efficiency, the distribution and the configuration of data in a storage medium are adjusted according to the analysis result, a storage plan is formulated according to the data access frequency and the type, and the specific steps of generating a data storage position optimization strategy are as follows:
Based on the data fluidity prediction model, adopting an association rule learning algorithm, and identifying a hot spot area of data access by analyzing the co-occurrence mode and association degree between data items to generate a data access mode analysis result;
Based on the data access mode analysis result, adopting a storage medium performance analysis technology, and selecting a data storage medium by comparing the read-write speeds and response time of various storage media to generate a storage medium performance comparison result;
Based on the storage medium performance comparison result, adopting a load balancing algorithm, uniformly distributing data load and optimizing storage efficiency by analyzing a data fluidity prediction result and the storage medium performance, and generating an optimized data distribution scheme;
And based on the optimized data distribution scheme, adopting a data storage planning technology, and planning the storage position and the migration path of the data by analyzing the data access frequency and type to generate a data storage position optimization strategy.
4. The intelligent data analysis method according to claim 1, wherein: based on the dynamic data processing model, a scheduling method based on priority queues and machine learning is adopted to conduct real-time analysis and adjustment of task allocation and resource allocation, a task scheduling strategy is optimized according to data mobility and processing requirements, the priority and resource allocation of data analysis tasks are adjusted according to analysis results, and the specific steps of generating the dynamic priority scheduling model are as follows:
based on the dynamic data processing model, adopting a heap ordering algorithm, ordering tasks by constructing a binary heap structure to establish a priority queue, and dynamically adjusting according to the urgency and importance of the tasks to generate a task priority queue;
Based on the task priority queue, a linear programming algorithm is adopted, a strategy of resource allocation is designed and implemented, a simplex method is used for resource allocation, task demands and resource supply are balanced, and a resource allocation model is generated;
based on the resource allocation model, adopting a decision tree algorithm, analyzing data mobility and processing requirements by constructing a classification rule and a regression model, dynamically adjusting task priority and resource allocation, and generating an adjusted task scheduling strategy;
Based on the adjusted task scheduling strategy, a reinforcement learning algorithm is adopted, and a dynamic priority scheduling model is generated by continuously adapting to the change of data liquidity and processing requirements and optimizing task priority and resource allocation through implementing a reward feedback mechanism and state space analysis.
5. The intelligent data analysis method according to claim 1, wherein: based on the dynamic priority scheduling model, adopting an information entropy theory and Huffman coding to carry out data characteristic and statistical analysis, coding the data, and carrying out data compression according to the data characteristic, wherein the specific steps of generating a data compression model are as follows:
Based on the dynamic priority scheduling model, an information entropy theory is adopted, the entropy value of the whole data set is estimated by calculating the occurrence frequency of each data element, the diversity and complexity of the data are analyzed, the potential benefit of compression is estimated, and a data entropy analysis result is generated;
based on the data entropy analysis result, adopting a probability statistical analysis method, and analyzing the occurrence probability and frequency distribution of each data element to reveal the internal statistical characteristics of the data and generate a data statistical characteristic model;
based on the data statistical characteristic model, adopting a Huffman coding algorithm, constructing a coding tree based on symbol frequency, distributing a unique binary code for each symbol, optimizing the total coding length of the whole data set, and generating a data coding scheme;
Based on the data coding scheme, a data compression technology is adopted, data is coded by applying Huffman coding, original data symbols are replaced, and a data compression model is generated by combining symbol optimization coding processes with similar frequencies.
6. The intelligent data analysis method according to claim 1, wherein: based on the data fluidity prediction model, the data storage position optimization strategy, the dynamic data processing model, the dynamic priority scheduling model, the data compression model and the data anomaly detection model, a multi-level data integration method is adopted to conduct data analysis flow planning and resource optimization configuration, and continuous optimization of data processing efficiency and accuracy is conducted, and the specific steps of generating the data processing and analysis strategy are as follows:
based on the data fluidity prediction model, a data integration algorithm is adopted, prediction data of a plurality of models are summarized, fusion is carried out on data sources, the fusion comprises time points, numerical values and frequencies, a comprehensive view containing multidimensional data streams is constructed, and a data flow overview is generated;
Based on the data flow overview and the data storage position optimization strategy, adopting a space reconstruction algorithm, and re-planning the distribution of the data on the storage medium by analyzing the access frequency and the storage efficiency of the data, balancing the access speed and the storage space utilization rate, and generating an adjusted data storage layout;
Based on the adjusted data storage layout and the dynamic data processing model, adopting a scheduling algorithm, selecting the priority of each data processing task by analyzing the time sensitivity and the resource consumption of data processing, and re-planning the data processing flow according to the data processing requirements and the available resources to generate an adjusted data processing strategy;
Based on the adjusted data processing strategy, the dynamic priority scheduling model, the data compression model and the data anomaly detection model, an optimization adjustment technology is adopted, and the execution sequence and the data storage format of the data processing tasks are adjusted by evaluating the emergency degree of the multiprocessing tasks and the possibility of data compression, so that a data processing and analyzing strategy is generated.
7. An intelligent data analysis system, characterized in that: the intelligent data analysis method according to any one of claims 1-6, wherein the system comprises a data fluidity prediction module, a data storage optimization module, a data processing flow module, a task scheduling module, a data compression module, an anomaly detection module, an integrated analysis module, and a system optimization module;
the data fluidity prediction module is used for analyzing the periodicity and trend of the historical data and predicting the future mode of data flow by adopting a cyclic neural network and long-term and short-term memory network method and combining an autoregressive model of a time sequence based on the historical data of the database to generate a data fluidity prediction model;
The data storage optimization module analyzes the co-occurrence mode and the association degree among data items by adopting an association rule learning algorithm based on a data fluidity prediction model, and re-plans the layout of data in a storage medium by combining a space reconstruction algorithm to generate a data storage position optimization strategy;
the data processing flow module adopts differential equation modeling based on a data storage position optimization strategy, combines time sequence analysis, adjusts data sampling frequency and processing parameters according to various data types and requirements, and generates a dynamic data processing model;
the task scheduling module builds a task execution queue by adopting a heap ordering algorithm based on a dynamic data processing model, performs resource allocation by using a linear programming method, dynamically adjusts task priority and resource allocation, and generates a dynamic priority scheduling model;
The data compression module analyzes data diversity by adopting an information entropy theory based on a dynamic priority scheduling model, codes data characteristics by combining a Huffman coding algorithm, and performs data compression processing to generate a data compression model;
the anomaly detection module is used for identifying an irregular mode and potential anomalies in data by adopting a chaos theory and a nonlinear data analysis method based on a data compression model, and carrying out dynamic anomaly detection to generate a data anomaly detection model;
The integrated analysis module is used for integrating contents and comprehensively evaluating and analyzing data based on a data fluidity prediction model, a data storage position optimization strategy, a dynamic data processing model, a dynamic priority scheduling model, a data compression model and a data anomaly detection model by adopting a multi-level data integration method to generate a comprehensive data analysis result;
The system optimization module adopts a self-adaptive optimization technology driven by machine learning based on the comprehensive data analysis result, adjusts and optimizes the whole flow and resources according to the real-time feedback of the data processing flow and the resource configuration, and generates a data processing and analysis strategy.
CN202410232109.1A 2024-03-01 2024-03-01 Intelligent data analysis method and system Active CN117807425B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410232109.1A CN117807425B (en) 2024-03-01 2024-03-01 Intelligent data analysis method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410232109.1A CN117807425B (en) 2024-03-01 2024-03-01 Intelligent data analysis method and system

Publications (2)

Publication Number Publication Date
CN117807425A CN117807425A (en) 2024-04-02
CN117807425B true CN117807425B (en) 2024-05-10

Family

ID=90428360

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410232109.1A Active CN117807425B (en) 2024-03-01 2024-03-01 Intelligent data analysis method and system

Country Status (1)

Country Link
CN (1) CN117807425B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118134539B (en) * 2024-05-06 2024-07-19 山东传奇新力科技有限公司 User behavior prediction method based on intelligent kitchen multi-source data fusion

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020206705A1 (en) * 2019-04-10 2020-10-15 山东科技大学 Cluster node load state prediction-based job scheduling method
CN115941696A (en) * 2022-12-08 2023-04-07 西安理工大学 Heterogeneous Big Data Distributed Cluster Storage Optimization Method
CN116029522A (en) * 2023-02-07 2023-04-28 南京领专信息科技有限公司 E-business ERP information optimization system
CN117252447A (en) * 2023-11-17 2023-12-19 山东海晟盐业有限公司 Industrial salt production statistical method and system
CN117522084A (en) * 2024-01-05 2024-02-06 张家口市华工建设有限公司 Automatic concrete grouting scheduling system
CN117539726A (en) * 2024-01-09 2024-02-09 广东奥飞数据科技股份有限公司 Energy efficiency optimization method and system for green intelligent computing center
CN117575663A (en) * 2024-01-17 2024-02-20 深圳市诚识科技有限公司 Fitment cost estimation method and system based on deep learning
CN117592823A (en) * 2024-01-19 2024-02-23 天津路联智通交通科技有限公司 Civil construction sewage treatment method and system
CN117591944A (en) * 2024-01-19 2024-02-23 广东工业大学 Learning early warning method and system for big data analysis

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102644593B1 (en) * 2021-11-23 2024-03-07 한국기술교육대학교 산학협력단 An AI differentiation based HW-optimized Intelligent Software Development Tools for Developing Intelligent Devices

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020206705A1 (en) * 2019-04-10 2020-10-15 山东科技大学 Cluster node load state prediction-based job scheduling method
CN115941696A (en) * 2022-12-08 2023-04-07 西安理工大学 Heterogeneous Big Data Distributed Cluster Storage Optimization Method
CN116029522A (en) * 2023-02-07 2023-04-28 南京领专信息科技有限公司 E-business ERP information optimization system
CN117252447A (en) * 2023-11-17 2023-12-19 山东海晟盐业有限公司 Industrial salt production statistical method and system
CN117522084A (en) * 2024-01-05 2024-02-06 张家口市华工建设有限公司 Automatic concrete grouting scheduling system
CN117539726A (en) * 2024-01-09 2024-02-09 广东奥飞数据科技股份有限公司 Energy efficiency optimization method and system for green intelligent computing center
CN117575663A (en) * 2024-01-17 2024-02-20 深圳市诚识科技有限公司 Fitment cost estimation method and system based on deep learning
CN117592823A (en) * 2024-01-19 2024-02-23 天津路联智通交通科技有限公司 Civil construction sewage treatment method and system
CN117591944A (en) * 2024-01-19 2024-02-23 广东工业大学 Learning early warning method and system for big data analysis

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于LSTM神经网络的金融时间序列预测;欧阳红兵;黄亢;闫洪举;;中国管理科学;20200415(第04期);全文 *
混合存储数据迁移策略研究;罗保山;张鑫;王栩;谭支鹏;;计算机技术与发展;20160505(第06期);全文 *

Also Published As

Publication number Publication date
CN117807425A (en) 2024-04-02

Similar Documents

Publication Publication Date Title
CN117807425B (en) Intelligent data analysis method and system
CN101888309B (en) Online log analysis method
CN113673857B (en) Service awareness and resource scheduling system and method for data center
CN109558287A (en) A kind of solid-state disk service life prediction technique, device and system
CN102081622A (en) Method and device for evaluating system health degree
Rahman et al. Replica selection strategies in data grid
CN106383916B (en) Data processing method based on predictive maintenance of industrial equipment
CN117687891B (en) Index calculation optimization system based on AI
JP7401677B2 (en) Model update system, model update method and related equipment
US11675643B2 (en) Method and device for determining a technical incident risk value in a computing infrastructure from performance indicator values
CN116881744B (en) Operation and maintenance data distribution method, device, equipment and medium based on Internet of things
Cao et al. Load prediction for data centers based on database service
Germain-Renaud et al. The grid observatory
CN117787711A (en) Arch bridge construction single-end management system based on big data
CN117076882A (en) Dynamic prediction management method for cloud service resources
CN116827950A (en) Cloud resource processing method, device, equipment and storage medium
CN117453137A (en) Cloud intelligent operation and maintenance system data management system
Badhiye et al. KNN technique for analysis and prediction of temperature and humidity data
CN117675691A (en) Remote fault monitoring method, device, equipment and storage medium of router
Andreev et al. Approach to forecasting the development of situations based on event detection in heterogeneous data streams
Galli et al. An explainable artificial intelligence methodology for hard disk fault prediction
CN114818460A (en) Laboratory equipment residual service life prediction method based on automatic machine learning
Chen et al. Hierarchical RNN-based framework for throughput prediction in automotive production systems
WO2023110059A1 (en) Method and system trace controller for a microservice system
CN117972367B (en) Data storage prediction method, data storage subsystem and intelligent computing platform

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant