CN115455236A - Data analysis system, method, server and storage medium - Google Patents
Data analysis system, method, server and storage medium Download PDFInfo
- Publication number
- CN115455236A CN115455236A CN202211401559.6A CN202211401559A CN115455236A CN 115455236 A CN115455236 A CN 115455236A CN 202211401559 A CN202211401559 A CN 202211401559A CN 115455236 A CN115455236 A CN 115455236A
- Authority
- CN
- China
- Prior art keywords
- data
- user behavior
- behavior data
- processing
- module
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000007405 data analysis Methods 0.000 title claims abstract description 59
- 238000000034 method Methods 0.000 title claims abstract description 52
- 238000003860 storage Methods 0.000 title claims abstract description 8
- 230000006399 behavior Effects 0.000 claims abstract description 85
- 238000012545 processing Methods 0.000 claims abstract description 80
- 238000004458 analytical method Methods 0.000 claims abstract description 44
- 230000004931 aggregating effect Effects 0.000 claims abstract description 7
- 238000010606 normalization Methods 0.000 claims abstract description 4
- 238000007781 pre-processing Methods 0.000 claims description 22
- 238000004364 calculation method Methods 0.000 claims description 12
- 230000008569 process Effects 0.000 claims description 12
- 238000004590 computer program Methods 0.000 claims description 6
- 238000004422 calculation algorithm Methods 0.000 claims description 4
- 238000013135 deep learning Methods 0.000 claims description 4
- 230000001360 synchronised effect Effects 0.000 claims description 3
- 238000012512 characterization method Methods 0.000 claims description 2
- 230000010354 integration Effects 0.000 abstract description 9
- 238000012958 reprocessing Methods 0.000 abstract description 6
- 230000006870 function Effects 0.000 description 9
- 238000010586 diagram Methods 0.000 description 8
- 230000002159 abnormal effect Effects 0.000 description 6
- 238000013473 artificial intelligence Methods 0.000 description 4
- 230000006854 communication Effects 0.000 description 4
- 238000013461 design Methods 0.000 description 4
- 238000013519 translation Methods 0.000 description 4
- 238000004891 communication Methods 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000004445 quantitative analysis Methods 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 230000007547 defect Effects 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000003203 everyday effect Effects 0.000 description 2
- 238000004880 explosion Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 238000012423 maintenance Methods 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 238000004451 qualitative analysis Methods 0.000 description 2
- 238000012300 Sequence Analysis Methods 0.000 description 1
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 230000002547 anomalous effect Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000003542 behavioural effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000005094 computer simulation Methods 0.000 description 1
- 238000005206 flow analysis Methods 0.000 description 1
- 238000005111 flow chemistry technique Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 230000008092 positive effect Effects 0.000 description 1
- 238000010223 real-time analysis Methods 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/80—Information retrieval; Database structures therefor; File system structures therefor of semi-structured data, e.g. markup language structured data such as SGML, XML or HTML
- G06F16/81—Indexing, e.g. XML tags; Data structures therefor; Storage structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/80—Information retrieval; Database structures therefor; File system structures therefor of semi-structured data, e.g. markup language structured data such as SGML, XML or HTML
- G06F16/83—Querying
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a data analysis system, a data analysis method, a server and a storage medium, relates to the technical field of metadata analysis, and aims to obtain user behavior data and preprocess the user behavior data to form a user behavior data set; grouping or aggregating the user behavior data sets, and calculating the mean value and the standard deviation in the user behavior data sets; scoring the user behavior data; carrying out normalization processing on user behavior data to form an event objectification map which is used for expressing the connection relation between events and behaviors and between events; and storing and displaying the processing result. The method can realize dynamic adjustment of data processing flow, combination and reprocessing of data processing results, layered output/reprocessing of data processing results and dynamic expansion of data set/data set rule exponential type increment. The problems of massive increase of analysis rules of data, dynamic expansion of rule operators and integration of data processing batch flow are solved.
Description
Technical Field
The present invention relates to the field of metadata analysis technologies, and in particular, to a data analysis system, a data analysis method, a server, and a storage medium.
Background
Data analysis is the core of the data processing flow, because the value embedded in the data results from the analysis process. The most important difference from the conventional data analysis is that the amount of data is rapidly increasing. As the amount of data grows, the demand for storage, querying, and analysis of data is rapidly increasing. From the perspective of practical operation, the "big data analysis" needs to search a mode by analyzing raw data, find a root cause factor causing a real situation, and optimize by establishing a model and predicting, so as to realize continuous improvement and innovation in various fields. The existing data analysis means mainly comprises the following modes:
and (3) analyzing a model: and a customized data analysis program developed by combing the data analysis logic is developed according to the customization of special requirements.
Batch analysis: after the page rule configuration is carried out, the program translates according to the configured rule to generate a Spark SQL executable file, and regularly schedules and executes analysis.
Real-time/sequence analysis: after the page rule configuration, the program assembles and generates siddhi executable files according to the configured rule, and schedules in real time to analyze data.
The customized model-based analysis mode has relatively single function, needs research personnel to develop an analysis program, has higher technical requirement on the research personnel, has relatively longer development period and lower model reusability. The performance optimization aspect of the model is uneven, and great hidden dangers exist in the stability of the whole system.
Based on batch analysis of page rule configuration, the rules are solidified in a code translation mode, and dynamic adjustment of the rules is not facilitated. The translated codes are not beneficial to performance optimization of a code layer, meanwhile, the mode is based on a rule translation mode, multiple analysis codes can be generated according to multiple analysis rules of the same data set, the same data can be loaded simultaneously during analysis, repeated pulling of the data is caused, resource waste is caused, stable operation of the whole system is influenced, the requirement of rule index increase cannot be met, and the code translation mode has no practical significance on mass data analysis.
The analysis mode is essentially the same as a code translation mode of batch analysis, and a program is translated into a statement live file executable by an engine. Due to the inherent defects of the flow analysis engine, the analysis mode has the condition that the data is lost, and meanwhile, the analysis performance cannot meet the service requirement and cannot support the analysis appeal of data explosion increase.
Disclosure of Invention
In order to solve the defects of the prior art, the invention provides a data analysis system which is based on a metadata and strategy driven data analysis method and solves the problems of massive increase of analysis rules of data, dynamic expansion of rule operators and batch flow integration of data processing.
The data analysis system includes: the system comprises a data preprocessing module, a user behavior analysis module, an event scoring and handling module, a objectification processing module and a system architecture module;
the data preprocessing module is used for acquiring user behavior data and preprocessing the user behavior data to form a user behavior data set;
the user behavior analysis module is used for grouping or aggregating the user behavior data sets and calculating the mean value and the standard deviation in the user behavior data sets;
the event scoring and handling module is used for scoring the user behavior data;
the objectification processing module is used for carrying out normalization processing on the user behavior data to form an event objectification map which is used for expressing the connection relation between events and behaviors and between events;
the system architecture module is used for storing the processing result for display.
It should be further noted that the data preprocessing module preprocesses the user behavior data in the form of an XML file.
It should be further noted that the main flow of the data preprocessing module preprocesses the user behavior data in an XML configuration manner, and the processing rule of the user behavior data is implemented based on XML configuration.
It should be further noted that the objectification processing module further calculates and analyzes the existing objectification data based on deep learning and graph calculation in combination with a large-scale graph characterization algorithm, and describes the corresponding relationship using fixed points and edges.
It should be further noted that, the objectification processing module divides the calculation process into a plurality of supersteps based on a synchronous scheduling mode of the BSP model, and all vertex programs in each superstep are independently and concurrently executed, and after the execution is finished, global synchronization is performed.
It should be further noted that the system architecture module provides a browser, so that a user can access the system through the browser to perform relevant rule configuration operations in a page.
It should be further noted that the system architecture module is further configured to issue the rule to the engine, initialize the processing rule in the configuration item, load the data, perform uniform data processing, analysis, event handling, and objectification processing on the user behavior data, and store and display the processing result.
The invention also provides a data analysis method, which comprises the following steps:
s101, acquiring user behavior data, and preprocessing the user behavior data to form a user behavior data set;
s102, grouping or aggregating the user behavior data sets, and calculating the mean value and the standard deviation in the user behavior data sets;
s103, scoring the user behavior data;
s104, normalizing the user behavior data to form an event objectification map which is used for expressing the connection relation between events and behaviors and between events;
and S105, storing the processing result for display.
The invention also provides a server comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor executes the steps of the data analysis method based on metadata and policy driving.
The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements metadata and policy driven data analysis method steps.
According to the technical scheme, the invention has the following advantages:
the metadata and policy drive-based data analysis system and method provided by the invention adopt the form of XML files to embody the rule configuration of data processing. The data analysis system and method based on metadata and policy driving provided by the invention use a Flink calculation engine to replace a Spark calculation engine to perform data processing. The configuration items in the system can use Json, ymal and MySQL to replace XML forms, the data processing flow can be customized, and the data access source and the data output source are customized according to different processing rules configured for different data sets. The dynamic adjustment of the data processing flow, the combination and reprocessing of the data processing results, the layered output/reprocessing of the data processing results and the dynamic expansion of the regular exponential increment of the data set/data set can be realized. The problems of massive increase of analysis rules of data, dynamic expansion of rule operators and integration of data processing batch flow are solved.
The invention realizes one-to-one, one-to-many and many-to-many parallel analysis modes of the data set and the analysis rule. The analysis process is processed, and the processor supports thermal expansion. The analysis process can be programmed with functions, and the system level realizes the integration of flow and batch. The method has the advantages of simple use, multiple coverage scenes, excellent performance, strong expandability, plug-in processing and high data analysis accuracy.
Drawings
In order to more clearly illustrate the technical solution of the present invention, the drawings used in the description will be briefly introduced, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained based on these drawings without creative efforts.
FIG. 1 is a schematic diagram of a data analysis system;
FIG. 2 is an architecture diagram of an embodiment of a data analysis system;
FIG. 3 is a flow chart of a method of data analysis.
Detailed Description
As shown in fig. 1 and 2, the present invention provides a diagram provided in a data analysis system, which is only a schematic way to illustrate the basic idea of the present invention, and the data analysis system can acquire and process the associated data based on artificial intelligence technology. Among them, artificial Intelligence (AI) is a theory, method, technique and application device that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result.
The invention relates to a theory, a method, a technology and an application system for simulating, extending and expanding human intelligence by using a digital computer or a machine controlled by the digital computer, sensing the environment, acquiring knowledge and obtaining the best result by using the knowledge. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.
Fig. 1 and 2 show diagrams of a preferred embodiment of the data analysis system of the present invention. The data analysis system based on metadata and policy driving is applied to one or more servers, and the terminal is a device capable of automatically performing numerical calculation and/or information processing according to preset or stored instructions, and the hardware thereof includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.
The server may include a network device and/or a user device. Wherein the network device includes, but is not limited to, a single network server, a server group consisting of a plurality of network servers, or a Cloud Computing (Cloud Computing) based Cloud consisting of a large number of hosts or network servers.
The Network where the server is located includes, but is not limited to, the internet, a wide area Network, a metropolitan area Network, a local area Network, a Virtual Private Network (VPN), and the like.
The data analysis system based on metadata and strategy drive can solve the problems of massive increase of analysis rules, dynamic expansion of rule operators and batch and flow integration of data processing, and has a positive effect on the programmable function of an analysis flow and the realization of flow and batch integration at a system level.
Referring now to the drawings, which are flow charts of a data analysis system based on metadata and policy driven in one embodiment, the data analysis system includes: the system comprises a data preprocessing module, a user behavior analysis module, an event scoring and handling module, a objectification processing module and a system architecture module;
the data preprocessing module is used for acquiring user behavior data and preprocessing the user behavior data to form a user behavior data set;
the user behavior analysis module is used for grouping or aggregating the user behavior data sets and calculating the mean value and the standard deviation in the user behavior data sets;
the event scoring and handling module is used for scoring the user behavior data;
the objectification processing module is used for carrying out normalization processing on the user behavior data to form an event objectification map which is used for representing the connection relation between events and behaviors and between events;
the system architecture module is used for storing and displaying the processing result.
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.
In this embodiment, due to the fact that the user behavior data sources and different business systems, the original data of each business system has the problems of noisy data, non-uniform data format standards, non-standard data quality, scattered data and the like, the original data cannot accurately provide analysis bases for enterprise decisions, and meanwhile, online machine learning is limited. The data preprocessing module of the data analysis system extracts, cleans, converts and extracts user behavior data of each business system, integrates scattered, disordered and standard non-uniform data into standard behavior data, and provides analysis basis for enterprise decision making.
Because the original data set generated by each business system is large in quantity and complex, and the independent data preprocessing for each type of data is complex and low in efficiency, the data analysis method and system based on metadata and strategy driving perfectly meet the data processing requirement of data explosion increase.
The data preprocessing module configures the data processing flow and the data processing rules in the form of XML files, the main flow configures a specific data preprocessing process through XML, the specific data processing rules are realized through specific XML configuration, and one-to-one, one-to-many and many-to-many data scenes of data processing are solved.
As the system of the invention, the system can use a Flink calculation engine to replace a Spark calculation engine to perform data processing. In the system architecture design, the configuration items can be formed by Json, ymal and MySQL instead of XML, so that the integration of data processing configuration, flow, core function unification and data batch flow processing is realized, the data processing efficiency is improved by more than two times than before, and the resource occupancy rate is reduced by fifty percent.
When the system is used for the user behavior, various quantitative analysis dimensions such as access times, login times and the like can be mined in the user behavior data. And the user behavior analysis module carries out reasonable grouping or aggregation operation on the quantitative analysis original data set according to actual requirements, and then calculates the mean value and the standard deviation. In order to prevent misjudgment and misjudgment caused by a single analysis mode, a deep learning method is additionally used for establishing a qualitative analysis baseline, and abnormal flow detection is carried out from multiple angles.
The method is operated by establishing baselines for human and machine behaviors through normal behaviors, and scientifically establishing a behavior model of each applicable attribute of a user or an entity. Given enough data, trends can be identified that represent typical behavior of the user, and deviations from this baseline are readily considered anomalous behavior and potential threats.
The event scoring processing module of the invention aims to prevent misjudgment and misjudgment caused by a single analysis mode, and reduces false-alarm safety alarm by using the concept of risk scoring. A single behavioral anomaly is not sufficient to alert analysts of the potential threat. Atypical behavior of the present invention may present a risk to the user or entity. After receiving sufficient risk within a particular time frame, the user or entity considers it to be high risk and notifies the analyst of the potential threat.
When there is a deviation from baseline, the system will increase the risk score for that user or machine. The more abnormal the behavior, the higher the risk score. As more and more suspicious behaviors accumulate, the risk score increases, knowing that a threshold is reached, upgrading it to operation and maintenance personnel for event handling.
For the objectification processing module of the present embodiment, an event objectification map is formed by normalizing processing data to represent events and behaviors and the connection relationship between events.
The object processing module calculates and analyzes the existing object data by relying on a series of algorithms combining deep learning and graph calculation with large-scale graph characteristics, and describes by using fixed points (Vertex) and edges (Edge): vertices represent objects and edges represent relationships between objects. With a vertex-centric computational model, each vertex program can be scheduled in parallel. A synchronous scheduling mode based on a BSP model is adopted, the calculation process is divided into a plurality of super steps (each super step generally corresponds to one iteration), all vertex programs in each super step are independently and parallelly executed, and global synchronization is carried out after the execution is finished. The vertex program may generate messages that are sent to other vertices, with the communication process being separate from the computation process.
The system architecture module of this embodiment may provide a browser, so that a user accesses the system through the browser to perform a relevant rule configuration operation in a page.
The load pre-processing to be able to solve the data in the system requires a lot of time and resources and is limited by the network bandwidth and server resources. The system architecture module can share the pressure of a single server, so that the computing engine adopts a distributed cluster architecture and a spark self-contained Standalone architecture mode.
And accessing the front-end page through the browser, and performing related rule configuration operation in the page. And then issuing the rules to an engine, initializing a processor and processing rules in the configuration items, loading data, performing unified data processing, analysis, event processing, objectification processing and other operations on the object data, and finally writing a processing result into a memory for application processing and display.
The system of the invention can realize the communication between the servers and use the asynchronous message queue component to transmit information.
In order to verify that the metadata and policy driven data analysis system can realize one-to-one, one-to-many and many-to-many parallel analysis modes of data sets and analysis rules. Further realizing analysis flow processization, and the processor supports thermal expansion. And the superiority of the system is verified through the detection effectiveness of the abnormal flow.
Extracting the host ip as follows: 127.0.0.1, selecting the fields in the following table for calculation:
field(s) | Means of | Type (B) |
sip | Source ip | string |
dip | Destination ip | string |
logtime | Time of access | time |
sport | Source port | string |
dport | Destination port | string |
a_proto | Application layer protocol | string |
And performing configuration file type conversion by taking the fields in the table as analysis conditions, mainly generating a processing flow configuration file, and processing logic configuration.
Establishing a baseline according to the visit times of the target ip 172.168.3.1 in the quantitative analysis, and determining an error interval as follows: [31-3,31+3]. Then, the access data of the destination ip 172.168.3.1 is aggregated by the host 127.0.0.1 every day, and if the access data is included in the interval, the access data is proved to be normal data, and otherwise, the access data is abnormal.
Establishing a qualitative analysis baseline, determining all previous normal and abnormal access data of the host ip, and putting the model into training. And then, judging the access data of 127.0.0.1 item by item every day, and detecting whether the access data is abnormal.
Compared with the traditional method, the system of the invention can fundamentally solve the problems of overhigh resource occupation, low efficiency and batch separation of mass data rules of the prior art for processing the mass data of the mass data set. The data processing engine in the true sense is realized, and all data processing problems are solved by one set of programs. Batch flow integration, real-time data processing and batch data processing share one set of codes, and development and maintenance cost is reduced. Therefore, the method has the advantages of simplicity in use, multiple coverage scenes, excellent performance, strong expandability, plug-in processing and high data analysis accuracy.
The elements and algorithm steps of the examples described in connection with the embodiments disclosed in the metadata and policy driven data analysis system provided by the present invention may be embodied in electronic hardware, computer software, or combinations of both, and the components and steps of the examples have been described in a functional general sense in the foregoing description for the purpose of clearly illustrating the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
Metadata and policy driven data analysis system block diagram illustrating the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. Illustratively, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In a metadata and policy driven data analysis system, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may also be an electric, mechanical or other form of connection.
The following is an embodiment of a metadata and policy driven data analysis method provided by an embodiment of the present disclosure, and the metadata and policy driven data analysis method and the metadata and policy driven data analysis system according to the above embodiments belong to the same inventive concept, and details that are not described in detail in the embodiment of the metadata and policy driven data analysis method may refer to the above embodiment of the metadata and policy driven data analysis system.
A data analysis method, as shown in fig. 3, the method comprising:
s101, acquiring user behavior data, and preprocessing the user behavior data to form a user behavior data set; in the method, the rule configuration of data processing is embodied in the form of XML files.
S102, grouping or aggregating the user behavior data sets, and calculating the mean value and the standard deviation in the user behavior data sets;
s103, scoring the user behavior data;
s104, normalizing the user behavior data to form an event objectification map which is used for expressing the connection relation between events and behaviors and between events;
and S105, storing the processing result for display.
The data analysis method based on metadata and strategy driving can self-define the data processing flow, and self-define the data access source and the data output source aiming at different processing rules configured by different data sets. The dynamic adjustment of the data processing flow, the combination and reprocessing of the data processing results, the layered output/reprocessing of the data processing results and the dynamic expansion of the regular exponential increment of the data set/data set can be realized. The problems of massive increase of analysis rules of data, dynamic expansion of rule operators and integration of data processing batch flow are solved.
It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.
The metadata and policy driven data analysis based approach is the elements and algorithmic steps of the examples described in connection with the embodiments disclosed herein, which may be embodied in electronic hardware, computer software, or combinations of both, the components and steps of the examples having been described generally in terms of functionality in the foregoing description for clarity of explanation of interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the technical solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In a non-transitory computer readable storage medium of the present invention, a program product is stored that enables a metadata and policy driven data analysis-based method. In some possible embodiments, various aspects of the disclosure may also be implemented in the form of a program product comprising program code for causing a terminal device to perform the steps according to various exemplary embodiments of the disclosure described in the "exemplary methods" section above of this specification, when the program product is run on the terminal device.
In the base data analysis method of the present invention, computer program code for carrying out operations of the present disclosure may be written in any combination of one or more programming languages, including but not limited to an object oriented programming language such as Java, smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or power server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims (10)
1. A data analysis system, comprising: the system comprises a data preprocessing module, a user behavior analysis module, an event scoring and handling module, a objectification processing module and a system architecture module;
the data preprocessing module is used for acquiring user behavior data and preprocessing the user behavior data to form a user behavior data set;
the user behavior analysis module is used for grouping or aggregating the user behavior data sets and calculating the mean value and the standard deviation in the user behavior data sets;
the event scoring and handling module is used for scoring the user behavior data;
the objectification processing module is used for carrying out normalization processing on the user behavior data to form an event objectification map which is used for expressing the connection relation between events and behaviors and between events;
the system architecture module is used for storing the processing result for display.
2. The data analysis system of claim 1,
the data preprocessing module is used for preprocessing the user behavior data in the form of XML files.
3. The data analysis system of claim 1,
the main process of the data preprocessing module preprocesses the user behavior data in an XML configuration mode, and the processing rule of the user behavior data is realized based on XML configuration.
4. The data analysis system of claim 1,
the object processing module is also used for calculating and analyzing the existing object data based on deep learning and graph calculation combined with a large-scale graph characterization algorithm, and describing corresponding relations by using fixed points and edges.
5. The data analysis system of claim 4,
the object processing module divides the calculation process into a plurality of super steps based on the synchronous scheduling mode of the BSP model, all vertex programs in each super step are independently and parallelly executed, and global synchronization is carried out after the execution is finished.
6. The data analysis system of claim 1,
the system architecture module provides a browser, so that a user can access the system through the browser and perform related rule configuration operation in a page.
7. The data analysis system of claim 6,
the system architecture module is also used for issuing the rules to the engine, initializing the processing rules in the configuration items, loading the data, performing unified data processing, analysis, event handling and objectification processing operation on the user behavior data, and storing and displaying the processing results.
8. A data analysis method, characterized in that the method employs the data analysis system as claimed in any one of claims 1 to 7; the method comprises the following steps:
s101, acquiring user behavior data, and preprocessing the user behavior data to form a user behavior data set;
s102, grouping or aggregating the user behavior data sets, and calculating the mean value and the standard deviation in the user behavior data sets;
s103, scoring the user behavior data;
s104, normalizing the user behavior data to form an event objectification map which is used for expressing the connection relation between events and behaviors and between events;
and S105, storing the processing result for display.
9. A server comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the data analysis method steps of claim 8 when executing the program.
10. A non-transitory computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the data analysis method steps of claim 8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211401559.6A CN115455236A (en) | 2022-11-10 | 2022-11-10 | Data analysis system, method, server and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211401559.6A CN115455236A (en) | 2022-11-10 | 2022-11-10 | Data analysis system, method, server and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115455236A true CN115455236A (en) | 2022-12-09 |
Family
ID=84295692
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211401559.6A Pending CN115455236A (en) | 2022-11-10 | 2022-11-10 | Data analysis system, method, server and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115455236A (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105959162A (en) * | 2016-07-06 | 2016-09-21 | 吴本刚 | Distributed electric power enterprise information network safety management system |
CN110264336A (en) * | 2019-05-28 | 2019-09-20 | 浙江邦盛科技有限公司 | A kind of anti-system of intelligent case based on big data |
WO2021208342A1 (en) * | 2020-04-14 | 2021-10-21 | 广东卓维网络有限公司 | Power system based on cooperative interaction between diverse users and power grid |
-
2022
- 2022-11-10 CN CN202211401559.6A patent/CN115455236A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105959162A (en) * | 2016-07-06 | 2016-09-21 | 吴本刚 | Distributed electric power enterprise information network safety management system |
CN110264336A (en) * | 2019-05-28 | 2019-09-20 | 浙江邦盛科技有限公司 | A kind of anti-system of intelligent case based on big data |
WO2021208342A1 (en) * | 2020-04-14 | 2021-10-21 | 广东卓维网络有限公司 | Power system based on cooperative interaction between diverse users and power grid |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10628409B2 (en) | Distributed data transformation system | |
US10592666B2 (en) | Detecting anomalous entities | |
CN108039959B (en) | Data situation perception method, system and related device | |
CN111885040A (en) | Distributed network situation perception method, system, server and node equipment | |
CN106716454B (en) | Identifying non-technical losses using machine learning | |
US20210160262A1 (en) | Systems and methods for determining network data quality and identifying anomalous network behavior | |
EP3032442B1 (en) | Modeling and simulation of infrastructure architecture for big data | |
CN111740884B (en) | Log processing method, electronic equipment, server and storage medium | |
CN109831478A (en) | Rule-based and model distributed processing intelligent decision system and method in real time | |
US11042525B2 (en) | Extracting and labeling custom information from log messages | |
WO2022142859A1 (en) | Data processing method and apparatus, computer readable medium, and electronic device | |
CN114757307B (en) | Artificial intelligence automatic training method, system, device and storage medium | |
CN109144734A (en) | A kind of container resource quota distribution method and device | |
CN112446399A (en) | Label determination method, device and system | |
CN108829505A (en) | A kind of distributed scheduling system and method | |
CN104199889A (en) | RTLogic big data processing system and method based on CEP technology | |
CN113505048A (en) | Unified monitoring platform based on application system portrait and implementation method | |
CN114510708A (en) | Real-time data warehouse construction and anomaly detection method, device, equipment and product | |
CN106990913B (en) | A kind of distributed approach of extensive streaming collective data | |
CN114757448B (en) | Manufacturing inter-link optimal value chain construction method based on data space model | |
CN117332407A (en) | Network user behavior data generation method, system, storage medium and electronic equipment | |
CN109729110A (en) | Manage method, equipment and the computer-readable medium of dedicated processes resource | |
CN115455236A (en) | Data analysis system, method, server and storage medium | |
CN115314400A (en) | Network system anomaly detection method and device, electronic equipment and storage medium | |
CN109033196A (en) | A kind of distributed data scheduling system and method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20221209 |