CN116266183A

CN116266183A - Data analysis method, device, equipment and computer storage medium

Info

Publication number: CN116266183A
Application number: CN202111543476.6A
Authority: CN
Inventors: 潘鹏
Original assignee: China Mobile Communications Group Co Ltd; China Mobile Suzhou Software Technology Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; China Mobile Suzhou Software Technology Co Ltd
Priority date: 2021-12-16
Filing date: 2021-12-16
Publication date: 2023-06-20
Also published as: WO2023109964A1

Abstract

The application discloses a data analysis method, a device, equipment and a computer storage medium, wherein the method comprises the following steps: acquiring time window information corresponding to a current time window; determining target label information and target rule information according to the time window information; and acquiring data to be processed corresponding to the target tag information from a preset database, and carrying out data analysis on the data to be processed according to the target rule information in a time period corresponding to the current time window to obtain an analysis result. In this way, the real-time data in the preset database and the time window information have an association relation, so that the data to be processed is selected based on the time window information and analyzed by adopting the rule corresponding to the time window information, asynchronous processing of data receiving and data analysis can be realized, and the data analysis performance of the system is effectively improved.

Description

Data analysis method, device, equipment and computer storage medium

Technical Field

The present disclosure relates to the field of secure data analysis technologies, and in particular, to a data analysis method, apparatus, device, and computer storage medium.

Background

Data analysis is a process in which a suitable statistical analysis method analyzes a large amount of collected data, extracts useful information and forms conclusions to study and summarize the data in detail. In the context of real-time analysis of security device log analysis, as well as other real-time data, data analysis may assist people in making decisions to take appropriate action.

In the related technology, a plurality of open-source rule engines exist at present, each type of rule engine loads data into a memory, scripts are operated in the memory to match the data, the performance requirement of the whole matching process on a host machine is very high in a linear serial manner, the requirements of flexibility and variability cannot be met, a processing scene with large data volume cannot be met, and the system has no good transverse expansibility and cannot be suitable for flexible and variable use scenes; in addition, the current big data components provide abundant real-time analysis components, but the components are supported by a big data system, a lot of matched components are required to be deployed, the weight is too high, the input cost is too high, the practicability is poor, the server cluster has high requirements, and more cost is necessarily required to be input, so that the use scenes of the components are limited to a certain extent.

Disclosure of Invention

The data analysis method, the device, the equipment and the computer storage medium can realize asynchronous processing of data receiving and data analysis, so that the system data analysis performance is effectively improved, and the method and the device can adapt to changeable data analysis scenes and have higher practicability.

In order to achieve the above purpose, the technical scheme of the application is realized as follows:

in a first aspect, an embodiment of the present application provides a data analysis method, where the method includes:

acquiring time window information corresponding to a current time window;

determining target label information and target rule information according to the time window information;

and acquiring data to be processed corresponding to the target tag information from a preset database, and carrying out data analysis on the data to be processed according to the target rule information in a time period corresponding to the current time window to obtain an analysis result.

In a second aspect, embodiments of the present application provide a data analysis apparatus, including: an acquisition unit, a determination unit and an analysis unit, wherein,

the acquisition unit is configured to acquire time window information corresponding to a current time window;

The determining unit is configured to determine target tag information and target rule information according to the time window information;

the analysis unit is configured to acquire data to be processed corresponding to the target tag information from a preset database, and perform data analysis on the data to be processed according to the target rule information in a time period corresponding to the current time window to obtain an analysis result.

In a third aspect, embodiments of the present application provide a data analysis apparatus, including: a memory and a processor; wherein, the liquid crystal display device comprises a liquid crystal display device,

the memory is used for storing a computer program capable of running on the processor;

the processor is configured to perform the method according to the first aspect when the computer program is run.

In a fourth aspect, an embodiment of the present application provides a computer storage medium, where a data analysis program is stored, where the data analysis program is executed by at least one processor to implement the method according to the first aspect.

The data analysis method, the device, the equipment and the computer storage medium provided by the application acquire the time window information corresponding to the current time window; determining target label information and target rule information according to the time window information; and acquiring data to be processed corresponding to the target tag information from a preset database, and carrying out data analysis on the data to be processed according to the target rule information in a time period corresponding to the current time window to obtain an analysis result. In this way, the real-time data in the preset database and the time window information have an association relation, so that the data to be processed is selected based on the time window information and analyzed by adopting the rule corresponding to the time window information, asynchronous processing of data receiving and data analysis can be realized, and the method has good expansibility and is suitable for flexible and changeable scenes, thereby effectively improving the data analysis performance of the system and having higher practicability.

Drawings

Fig. 1 is a schematic flow chart of a data analysis method according to an embodiment of the present application;

FIG. 2 is a flow chart of another data analysis method according to an embodiment of the present disclosure;

FIG. 3 is a flow chart of another data analysis method according to an embodiment of the present disclosure;

fig. 4 is a schematic system architecture diagram of a data analysis method according to an embodiment of the present application;

FIG. 5 is a schematic diagram of a process for creating a data analysis rule according to an embodiment of the present application;

fig. 6 is a schematic diagram of a working process of a data receiving process according to an embodiment of the present application;

fig. 7 is a schematic diagram of a working process of a data analysis flow provided in an embodiment of the present application;

fig. 8 is a schematic diagram of a composition structure of a data analysis device according to an embodiment of the present application;

fig. 9 is a schematic diagram of a specific hardware structure of a data analysis device according to an embodiment of the present application;

fig. 10 is a schematic diagram of a composition structure of a data analysis device according to an embodiment of the present application.

Detailed Description

For a more complete understanding of the features and technical content of the embodiments of the present application, reference should be made to the following detailed description of the embodiments of the present application, taken in conjunction with the accompanying drawings, which are for purposes of illustration only and not intended to limit the embodiments of the present application.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the present application only and is not intended to be limiting of the present application.

In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is to be understood that "some embodiments" can be the same subset or different subsets of all possible embodiments and can be combined with one another without conflict. It should also be noted that the term "first/second/third" in reference to the embodiments of the present application is used merely to distinguish similar objects and does not represent a specific ordering for the objects, it being understood that the "first/second/third" may be interchanged with a specific order or sequence, if allowed, to enable the embodiments of the present application described herein to be implemented in an order other than that illustrated or described herein.

In practical application, the existing data analysis engines are mainly divided into a lightweight script-based rule engine and a big data-based real-time analysis engine, but all have some disadvantages, so that the use scenes of the data analysis engines are limited to a certain extent. For example, these drawbacks are as follows:

Script-based Drools-like rule engines have the main drawbacks: 1) A large data volume processing scenario is not applicable. In a mass processing scene, the processing of putting all data into a memory is not realistic, the consumption of the memory is huge due to the large amount of concurrent data, and once the memory overflows, the data is permanently lost and cannot be retrieved, which is an irreparable disaster for a data sensitive system. 2) And does not have good lateral expansibility. Aiming at the problem that data is lost, the preamble operation is necessarily required to have a data persistence function, but the application of the lateral expansion is difficult to ensure that the data is consumed only once, the lateral expansion has no meaning, and the data generated by analysis is worth losing. 3) It is not possible to adapt to flexible and changeable scenes. One version of the rule engine supports single processing, and the advantages of the rule engine are difficult to be exerted when the data needs to be processed in batches for a time window.

The real-time analysis engine based on big data has the main defects that: 1) Too heavy a level. The advantages of big data analysis are obvious, then many component supports are needed for the analysis engine to run, the preparation work needed for the analysis program to run is very much, and the application without building a big database platform can only be prohibitive. 2) The cost is high. Deploying a large data platform requires more machine resources and requires more maintenance costs.

Based on this, the embodiment of the application provides a data analysis method, and the basic idea of the method is as follows: acquiring time window information corresponding to a current time window; determining target label information and target rule information according to the time window information; and acquiring data to be processed corresponding to the target tag information from a preset database, and carrying out data analysis on the data to be processed according to the target rule information in a time period corresponding to the current time window to obtain an analysis result. In this way, the real-time data in the preset database and the time window information have an association relation, so that the data to be processed is selected based on the time window information and analyzed by adopting the rule corresponding to the time window information, asynchronous processing of data receiving and data analysis can be realized, and the method has good expansibility and is suitable for flexible and changeable scenes, thereby effectively improving the data analysis performance of the system and having higher practicability.

Example 1

In an embodiment of the present application, referring to fig. 1, a flow chart of a data analysis method provided in an embodiment of the present application is shown. As shown in fig. 1, the method may include:

S101: and acquiring time window information corresponding to the current time window.

It should be noted that, in the embodiment of the present application, the time window information may include information such as a window identifier (winId), a rule identifier (rule id), a data type (dataType), and an analysis identifier (isAya). When creating time windows (timeWin), the time window information is already determined, i.e. each time window corresponds to a set of time window information.

It should also be noted that, in the embodiment of the present application, the creation of the time window may be performed by calling a Task server (Task Service) to create a timing Task and starting the timing Task, and calling the creation time window interface through the timing Task, thereby implementing the timing creation of the time window. In practical application, the length of the time window may be specifically set according to practical situations, for example, may also be adjusted according to characteristics of data to be analyzed, data processing capability of the device itself, and the like, which is not limited in any way.

In some embodiments, the acquiring the current time window information may include:

acquiring a current time window from a preset message queue;

and acquiring time window information corresponding to the current time window from a preset database based on the current time window.

It should be noted that, the preset message queue may include a time window message queue and an initial data message queue. The time window information written in sequence is stored in the time window message queue, and the initial data message queue may be any data needing to be analyzed in real time, for example, the data in the initial data message queue may be data needing to be analyzed from external input, such as log data, etc.

It should also be noted that, taking the time window message queue as an example, for determining the preset message queue, in some embodiments, as shown in fig. 2, in the process of creating the time window, the method may further include:

s201: after the timing task is started, a first time window is created according to a preset time interval, and first time window information corresponding to the first time window is written into the preset database.

In a specific implementation process, a task service creation timing task template program can be called to create and start a timing task, parameter information such as a creation time window information interface, a time window (timeWin) and the like is provided, then an interface uniform resource locator (Uniform Resource Locator, URL) format is/v 1/timeWindow/add/1399357342040354818/1, the interface URL format is divided according to "/", wherein the last-second-segment parameter is a rule identifier (rule Id), the last-first-segment parameter is a data type (dataType), and the URL in the timing task template is called to create the time window after the timing task is started. In addition, after the time window is created, the corresponding time window information is stored in a preset database for later calling in the data analysis process.

S202: and acquiring second time window information from a preset cache region, and replacing the second time window information by using the first time window information and storing the second time window information in the preset cache region.

S203: and writing the second time window into the preset message queue.

It should be noted that, in the embodiment of the present application, the preset buffer area may be represented by a references buffer. The preset buffer area stores a plurality of time window information, and the data type corresponding to each time window information is completely from the data types existing in the original data, that is, all the data types in the original data can find the time window information with the same data type in the preset buffer area. Therefore, when the original data is successfully matched with the second time window information in the preset buffer area in the processing process, the second time window information used in the last use is replaced by the first time window information, and meanwhile, the second time window is written into the preset message queue.

It should be further noted that, in the embodiment of the present application, the first time window information may be time window information newly created according to the timing task, and the second time window information may be time window information in the last use. Specifically, the latest created time window information herein means time window information corresponding to the latest created time window among time windows created according to the timed task, and the last time window information in use means time window information in which the corresponding window identification was added as a tag to the intermediate data last time. In short, the above-described embodiment is to continuously update the time window information for which matching has been completed with the newly created time window information.

Thus, for a time window creation thread, the timing expression of the timing task may be automatically generated according to the time window (timeWin), e.g., 30 per timeWin, the expression is? When the timing initiates Data Analysis Service the call of creating the time window interface/v 1/timeWindow/add/1399357342040354818/1, after Data Analysis Service receives the request, firstly creating time window information and writing the time window information into the database, wherein the time window information mainly comprises information such as window identification (winId), rule identification (rule id), data type (dataType), analysis flag (isAya) and the like, the window identifier (winId) is data of a unique Long (Long) type on a time sequence automatically generated by the system, the rule identifier (rule id) is data of the last-to-last segment after division according to "/", and the data type (dataType) is data of the first-to-last segment after division according to "/", which is data of the URL request. Then accessing the preset buffer area to acquire the last time window information in use by taking datatype#rule Id as a Key, writing the latest time window information, namely the first time window information, into the preset buffer area by taking datatype#rule Id as the Key (Key) and the data with window identification (win Id) as the Value (Value), and then writing the last time window information, namely the second time window information, into the preset message queue.

Thus, after the time window information which can be subjected to data analysis currently is obtained from the preset message queue, corresponding data analysis can be performed according to the time window information.

S102: and determining target label information and target rule information according to the time window information.

Note that in the embodiment of the present application, the time window information may include tag information and rule information, where the tag information may include a data type (dataType) and a window identifier (winId), and the rule information may include a rule identifier (rule id).

In this way, when the current time window is acquired as a carrier of data analysis, the corresponding time window information can be directly acquired, further, the data type and the window identifier in the time window information are acquired as target tag information, and the rule identifier in the time window information is acquired as target rule information.

S103: and acquiring data to be processed corresponding to the target tag information from a preset database, and carrying out data analysis on the data to be processed according to the target rule information in a time period corresponding to the current time window to obtain an analysis result.

It should be noted that, before the data analysis is performed on the data to be processed, the data receiving process in the preset database needs to be implemented. The process specifically can be as follows: and acquiring data from a preset message queue, acquiring the latest time window information from a preset buffer area, adding a label of the time window information to the data, and storing the data in a preset database. In some embodiments, as shown in fig. 3, the method may further comprise:

S301: receiving initial data to be analyzed;

s302: and performing array conversion on the initial data to obtain a node array.

S303: acquiring a data type from the node array, and performing deserialization operation according to the data type to obtain at least one data table; each data table comprises a plurality of intermediate data.

S304: updating the at least one data table according to the data type corresponding to each data table, and determining at least one candidate data table; each candidate data table comprises the plurality of intermediate data and window identifications corresponding to the intermediate data.

S305: storing the at least one candidate data table to the preset database.

It should be noted that, in practical application, initial data may be obtained from a preset message queue, where the initial data is from external input data that needs to be analyzed. Taking security log data as an example, the security log data comprises a data type field, firstly converting the security log data into a node (Jsonnode) array, acquiring data corresponding to data type attributes from the node array, deserializing the security log data into a specific data object according to the data type, selecting time window information with the same data type from a preset buffer area under the condition of the data type, adding a window identifier (winId) in the time window information as a tag to the data object to obtain intermediate data, finishing the marking process of the security log data according to the method, and storing the marked data to a preset database.

When the data size is large, the initial data may be divided into a plurality of portions for marking, and the division of the initial data may be specifically set according to the actual situation, for example, may be determined together according to the length of the time window and the data processing capability of the device, which is not limited in any way.

Further, after data reception, for data analysis, the target data table may be selected according to the data type, and then the data list is selected according to the window identifier, that is, the selected data to be processed is used as the selected data to be analyzed subsequently. Here, the target tag information may include a target data type and a target window identification; therefore, in some embodiments, the obtaining the data to be processed corresponding to the target tag information from the preset database includes:

selecting a target data table corresponding to the target data type from the preset database;

and selecting the data to be processed corresponding to the target window identifier from the target data table.

It should be noted that the data table is a space in the preset database for storing the target data. In the data receiving process, the initial data can be stored in a corresponding data table after array conversion and deserialization operation by taking the data type as a condition; in this way, in the data analysis process, a corresponding target data table can be selected from a preset database through the data type; and then under the condition of determining the target window identifier, selecting target data with the target window identifier from a target data table, and determining the target data as data to be processed.

Further, in an embodiment of the present application, the target rule information may include a rule identifier. Therefore, in some embodiments, the performing data analysis on the data to be processed according to the target rule information to obtain an analysis result may include:

loading first rule information corresponding to the rule identifier from a preset rule table according to the rule identifier;

and carrying out data analysis on the data to be processed according to the first rule information to obtain the analysis result.

That is, the time window information is obtained from the preset message queue, and then the first rule information is loaded from the preset rule table according to the rule identifier, so that data analysis (such as duplicate removal and matching processing) can be performed on the data to be processed according to the first rule information, thereby obtaining an analysis result.

In a specific embodiment, the performing data analysis on the data to be processed according to the first rule information to obtain the analysis result may include:

matching the data to be processed according to rules in the first rule information;

and if the first rule information contains rules conforming to the data to be processed, determining a matching result as the analysis result.

It should be noted that, rule matching processing is performed on the data to be processed according to the first rule information, which may specifically be: and matching the data type of the data to be processed with the rules in the first rule information in the data analysis process. In addition, when the matching is successful, the data to be processed can be analyzed according to the matching rule and the analysis result can be output; when the matching is unsuccessful, alarm information can be sent out to remind the manual intervention to check the label of the target data, determine whether the label is wrong, and acquire new data to be processed in a preset database again.

In some embodiments, after the data analysis is performed on the data to be processed according to the target rule information to obtain an analysis result, the method may further include:

displaying the analysis result in the form of a graph or a table; and/or the number of the groups of groups,

and when the analysis result triggers a preset event, sending out alarm information.

After the analysis result is obtained, the analysis result can be displayed in a form of a graph or a table, and the analysis result can be queried through a query interface; in addition, when the analysis result triggers a preset event, alarm information can be sent out to remind the manual intervention or to perform automatic processing according to a preset processing mode.

Further, in order to provide rich expansibility, some auxiliary function plug-ins which can be configured to be effective are assisted, and the data to be processed can be preprocessed first. Thus, in some embodiments, the method may further comprise: and preprocessing the data to be processed.

In a specific embodiment, the preprocessing the data to be processed may include:

performing de-duplication treatment on the data to be treated; and/or the number of the groups of groups,

and carrying out real-time flow statistics processing on the data to be processed.

It should be noted that, in the embodiment of the present application, deduplication processing and real-time traffic statistics processing are two possible choices in the preprocessing process, and preprocessing the data to be processed is to adapt to rich service requirements, that is, provide rich expansibility, and also enable the data to be better matched with rule information, so as to further improve the accuracy of analysis.

In addition, after performing data analysis on the data to be processed according to the first rule information to obtain the analysis result, in some embodiments, the method may further include:

updating the analysis mark in the time window information to be 1, and deleting the data to be processed.

That is, for the selected data to be processed, operations such as duplication removal may be performed according to hash (hash), then the data to be processed is matched according to the first rule information, if the data accords with the rule, a matching result is output, finally the analysis identifier (isAya) in the time window information is updated to 1, and the data to be processed of the winId tag is deleted.

It should be further noted that, in the embodiment of the present application, the preprocessing rule may be stored in a preset database in the form of time window information, and in some embodiments, may also be in the form of a data processing plug-in, and may be accessed to the data processing device; specifically, the data processing device needs to reserve a preprocessing interface, and develop corresponding data processing plug-ins for access according to interface rules.

In this way, target tag information or data to be processed corresponding to the target data type are obtained from a preset database, and data analysis is carried out on the data to be processed according to target rule information, so that an analysis result is obtained; and then repeating the above processes as required, thus completing the real-time data analysis based on the time window.

That is, the embodiment of the application provides a data analysis engine suitable for batch processing of continuous data, on one hand, a real-time analysis alternative scheme of a real-time data analysis scene which cannot use a big data analysis engine due to cost problem is provided; on the other hand, the problem of performance which cannot be solved by a conventional lightweight open-source rule engine under a mass data analysis scene is solved, such as bottleneck problem of single machine processing, no transverse expansion capability, limited rule and insufficient abundance; on the other hand, the problem that a conventional lightweight open source rule engine cannot adapt to changeable data analysis scenes is solved. In this way, the embodiment of the application provides a design method of a real-time data analysis engine based on a continuous time window, the analysis engine is used for analysis of a security layer, attack of the security layer can be dealt with according to a real-time analysis result, and real-time performance is improved based on continuous time, so that response speed is improved.

The embodiment provides a data analysis method, which comprises the steps of obtaining time window information corresponding to a current time window; determining target label information and target rule information according to the time window information; and acquiring data to be processed corresponding to the target tag information from a preset database, and carrying out data analysis on the data to be processed according to the target rule information in a time period corresponding to the current time window to obtain an analysis result. In this way, the real-time data in the preset database and the time window information have an association relation, so that the data to be processed is selected based on the time window information and analyzed by adopting the rule corresponding to the time window information, asynchronous processing of data receiving and data analysis can be realized, and the method has good expansibility and is suitable for flexible and changeable scenes, thereby effectively improving the data analysis performance of the system and having higher practicability.

Example two

Based on the same inventive concept as the previous embodiments, the embodiments of the present application provide a data analysis method of a real-time data analysis engine based on a continuous time window, which mainly performs effective decoupling on persistence and data analysis of real-time data through maintenance of the time window, and converts serial data processing into asynchronous parallel data processing. Some auxiliary function plug-ins which can be configured to be effective, such as data deduplication, real-time data traffic statistics and the like, can be adapted to rich service requirements. With K8S (kubernetes) supported scenarios, the K8S can be used to support lateral expansion of services and service governance, and the K8S can not be used to realize lateral expansion of services simply by using a micro-service architecture. Referring to fig. 4, a schematic system architecture of a data analysis method according to an embodiment of the present application is shown. As shown in fig. 4, a data analysis device and a task server may be included. Wherein, the data analysis equipment is mainly divided into 4 parts:

401: the rule creates a thread. Creating a data analysis rule, defining a time window, and creating a timing task template according to the time window. The rule content determines a rule form according to the selected data storage tool, and if the rule form is a relational database, the rule form is a sphere statement, and if the rule form is a search engine, the rule form is an afferent json message of a query interface, and the like. Here, according to the rule creation thread, a time window may be created in a Task server (Task Service).

402: the time window creates a thread. The timing task initiates creation of a time window interface at fixed time, creates time window information first and writes the time window information into a database, accesses a cache to acquire the last time window information in use, covers the last stored information with the latest time window information, and writes the last time window information into a message queue.

Specifically, in the time window creation thread, first, a time window is created and saved to a database; secondly, the cache is read, original time window information is taken out, and new time window information is put into the cache; and finally, the original time window information is put into a message queue.

403: a group of data receiving threads. And acquiring data from the message queue, acquiring the latest time window information from the cache, marking the data with a time window information label (namely a window identifier) and storing the data into a database.

Specifically, in the data receiving thread, a plurality of data receiving threads can run simultaneously to form a data receiving thread group, so that the data can be rapidly marked by the time window information label, and the subsequent analysis and processing of the data are facilitated. For a data receiving thread, firstly, acquiring data from a message queue; secondly, obtaining time window information from the cache; and finally, marking the data with a time window label and storing the data in a database.

404: the data analyzes the thread group. Time window information is acquired from the message queue, rule information created in the rule creation thread 401 is loaded from the database, data is analyzed and processed according to the rule information, and an analysis result is output.

Specifically, in the data analysis thread, a plurality of data analysis threads can be operated simultaneously to form a data analysis thread group, and for one data analysis thread, firstly, time window information is obtained from a message queue; secondly, loading rule information from a database, and then performing duplication removal and matching processing on the data; and finally, outputting an analysis result.

In some embodiments, the message queue may choose to employ a RabbitMQ message queue, the database may choose to employ a MySQL database, and the cache may choose to employ a Redis cache.

In a specific embodiment, the real-time data is asynchronously analyzed based on a continuous time window, which is implemented as follows:

1) The rule creates a thread. Defining a data analysis rule, wherein the content comprises a rule identifier, a data type, a time window and rule information, after the rule information is stored in a database, a task (task service) creating timing task template program is called to create a timing task and started, parameter information such as a time window creating information interface, a time window and the like is provided in parameters, an interface URL format is/v 1/timeWindow/add/1399357342040354818/1, the interface URL is divided according to "/", wherein the last-last parameter is the rule identifier, the last-last parameter is the data type, and the URL in a timing task template is called to create the time window after the timing task is started.

2) The time window creates a thread. The timing expression of the timing task is automatically generated according to a time window, for example, per timewin=30, the expression is? The method comprises the steps of initiating a Data Analysis Service-provided creation time window interface/v 1/timeWindow/add/1399357342040354818/1 call at fixed time, firstly creating time window information and writing the time window information into a database after Data Analysis Service receives a request, wherein the time window information mainly comprises window identification, rule identification, data type, analysis mark and other information, the window identification automatically generates a unique Long type data on a time sequence by a system, the rule identification is divided into the last-second-segment data according to "/" in a URL request, and the data type is divided into the first-last-segment data according to "/" in the URL request. Then accessing the redis buffer to acquire the last time window information in use by using the datatype#rule Id as a key, writing the latest time window information, namely the data with the datatype#rule Id as the key and the win Id as the value, into the redis buffer, and then writing the last time window information (namely the old time window) into a message queue, as shown in a detail in fig. 5.

3) A group of data receiving threads. Data is obtained from the message queue from external inputs, here exemplified by log data, that need to be analyzed. The data comprises a data type field, firstly, the data is converted into a Jsonnode array, dataType attribute data is obtained from the Jsonnode array, the data is explicitly deserialized into a specific data object according to the data type, meanwhile, the dataType is taken as a condition in a cache, and a jedis providing scheme based on packaging an API of a redis cache is used: and (3) carrying out fuzzy matching on all key values by key (datatype+ ") and returning a Set, wherein the Set stores key values meeting the conditions, then acquiring a winId list by using a multi-get () method, finally marking data needing data analysis with a winId field label, storing the data into a database, storing the same piece of data, and storing a plurality of pieces of data according to the number of the winId list, wherein each winId needs to store one piece of data with the winId information. In this thread, an extensible analysis scenario, such as deduplication of data (calculating a hash value of the data), merging, etc., may be accessed, see in particular fig. 6.

4) The data analyzes the thread group. Obtaining time window information from a message queue, loading rule information from a rule table according to a rule Id, selecting a data table according to a data type, selecting a data list according to a winId, performing operations such as duplication removal on the selected data according to a hash, then performing matching in the selected data list according to selection in the rule information, outputting a matching result if the data accords with the rule, finally updating a time window information analysis identifier to 1, and deleting corresponding winId data in the data table, wherein the details are shown in fig. 7.

As shown in fig. 7, for the specific implementation steps of the data analysis thread, the steps may include:

s701: time window information is obtained from the message queue.

S702: rule information is loaded from the rule table according to a rule identification (rule id).

S703: data table 1 is selected according to data type (dataType) and window identification (winId)

The data is selected.

S704: and carrying out de-duplication processing on the selected data according to hash (hash).

S705: and carrying out query processing on the selected data according to a rule (selection).

S706: based on the deduplication and query results, an event (event) object is converted.

S707: updating the analysis identifier (isAya) in the time window information to 1, and deleting the data corresponding to the winId in the data table 1.

Briefly, in the embodiment of the application, under the condition of not changing the device architecture, a mass data analysis scheme based on a micro-service architecture is provided, and in particular, a real-time data analysis engine based on a continuous time window is provided. In a scene of real-time data analysis, a mode of asynchronous separation of data receiving and data analysis is used, so that the efficiency of real-time data analysis is improved. In addition, the embodiment of the application also provides a scalable data preprocessing scene. The data preprocessing scene can be effectively expanded and used through the custom data processing plug-in. Thus, compared with the open-source rule engines (such as Drools, aviator and the like) in the related art, on one hand, the requirements on equipment resources are general, and better performance can be provided under the same resources; on the other hand, the application scene is rich, the function of an open source rule engine is realized, and meanwhile, rich expansibility is provided, so that data can be preprocessed. Compared with real-time analysis components (such as Spark Streaming, flink, etc.) provided by large data components in the related art, the requirements for device resources are relatively small, and both software or hardware cost and operation and maintenance cost are relatively low.

As can be seen from the foregoing detailed description of the specific implementation of the foregoing embodiment by the foregoing embodiment, it can be seen that, due to the association between the real-time data and the time window information in the preset database, the data to be processed is selected based on the time window information and is analyzed by adopting the rule corresponding to the time window information, so that asynchronous processing of data reception and data analysis can be realized, and the method has good expansibility and is suitable for flexible and changeable scenarios, thereby effectively improving the system data analysis performance and having higher practicability.

Example III

Based on the same inventive concepts as the previous embodiments, referring to fig. 8, a schematic diagram of the composition structure of a data analysis device 80 according to an embodiment of the present application is shown. As shown in fig. 8, the data analysis device 80 may include: an acquisition unit 801, a determination unit 802, and an analysis unit 803; wherein, the liquid crystal display device comprises a liquid crystal display device,

an obtaining unit 801 configured to obtain time window information corresponding to a current time window;

a determining unit 802 configured to determine target tag information and target rule information according to the time window information;

and the analysis unit 803 is configured to acquire the data to be processed corresponding to the target tag information from a preset database, and perform data analysis on the data to be processed according to the target rule information in a time period corresponding to the current time window, so as to obtain an analysis result.

In some embodiments, the obtaining unit is specifically configured to obtain the current time window from a preset message queue; and acquiring time window information corresponding to the current time window from the preset database based on the current time window.

In some embodiments, referring to fig. 8, the data analysis device 80 further includes a creating unit 804 configured to create a first time window at a preset time interval after the timing task is started, and write first time window information corresponding to the first time window into the preset database; acquiring second time window information from a preset cache region, and replacing the second time window information with the first time window information and storing the second time window information in the preset cache region; writing the second time window into the preset message queue; the first time window information is the time window information which is newly created according to the timing task, and the second time window information is the time window information in the last use.

In some embodiments, referring to fig. 8, the data analysis device 80 further includes an identification unit 805 configured to receive initial data to be analyzed; performing array conversion on the initial data to obtain a node array; obtaining a data type from the node array, and performing deserialization operation according to the data type to obtain at least one data table; each data table comprises a plurality of intermediate data; updating the at least one data table according to the data type corresponding to each data table, and determining at least one candidate data table; each candidate data table comprises the plurality of intermediate data and window identifications corresponding to the intermediate data; and storing the at least one candidate data table to the preset database.

In some embodiments, the target rule information includes a rule identification; correspondingly, the analyzing unit 803 is specifically configured to load, according to the rule identifier, first rule information corresponding to the rule identifier from a preset rule table; and carrying out data analysis on the data to be processed according to the first rule information to obtain the analysis result.

In some embodiments, the analysis unit 803 is specifically configured to perform rule matching processing on the data to be processed according to the rule in the first rule information; and if the rule conforming to the data to be processed exists in the first rule information, determining a matching result as the analysis result.

In some embodiments, the analysis unit 803 is further configured to present the analysis results in the form of a graph or table; and/or sending out alarm information when the analysis result triggers a preset event.

In some embodiments, the analysis unit 803 is further configured to perform deduplication processing on the data to be processed; and/or carrying out real-time flow statistics processing on the data to be processed.

It will be appreciated that in this embodiment, the "unit" may be a part of a circuit, a part of a processor, a part of a program or software, etc., and may of course be a module, or may be non-modular. Furthermore, the components in the present embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional modules.

The integrated units, if implemented in the form of software functional modules, may be stored in a computer-readable storage medium, if not sold or used as separate products, and based on such understanding, the technical solution of the present embodiment may be embodied essentially or partly in the form of a software product, which is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) or processor to perform all or part of the steps of the method described in the present embodiment. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

Accordingly, the present embodiment provides a computer storage medium storing a data analysis program which, when executed by at least one processor, implements the steps of the method of any of the preceding embodiments.

Based on the above-described composition of the data analysis device 80 and the computer storage medium, referring to fig. 9, a specific hardware structure diagram of the data analysis apparatus 90 provided in the embodiment of the present application is shown. As shown in fig. 9, may include: a communication interface 901, a memory 902, and a processor 903; the various components are coupled together by a bus system 904. It is appreciated that the bus system 904 is used to facilitate connected communications between these components. The bus system 904 includes a power bus, a control bus, and a status signal bus in addition to a data bus. But for clarity of illustration, the various buses are labeled as bus system 904 in fig. 9. The communication interface 901 is configured to receive and send signals in a process of receiving and sending information with other external network elements;

a memory 902 for storing a computer program capable of running on the processor 903;

the processor 903 is configured to execute, when executing the computer program:

acquiring time window information corresponding to a current time window;

It is to be appreciated that the memory 902 in embodiments of the present application can be either volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory. The nonvolatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable EPROM (EEPROM), or a flash Memory. The volatile memory may be random access memory (Random Access Memory, RAM) which acts as an external cache. By way of example, and not limitation, many forms of RAM are available, such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (Double Data Rate SDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), and Direct memory bus RAM (DRRAM). The memory 902 of the systems and methods described herein is intended to comprise, without being limited to, these and any other suitable types of memory.

And the processor 903 may be an integrated circuit chip with signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuitry of hardware in the processor 903 or instructions in the form of software. The processor 903 described above may be a general purpose processor, a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), a field programmable gate array (Field Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. The disclosed methods, steps, and logic blocks in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present application may be embodied directly in hardware, in a decoded processor, or in a combination of hardware and software modules in a decoded processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in the memory 902, and the processor 903 reads information in the memory 902, and in combination with the hardware, performs the steps of the method described above.

It is to be understood that the embodiments described herein may be implemented in hardware, software, firmware, middleware, microcode, or a combination thereof. For a hardware implementation, the processing units may be implemented within one or more application specific integrated circuits (Application Specific Integrated Circuits, ASIC), digital signal processors (Digital Signal Processing, DSP), digital signal processing devices (DSP devices, DSPD), programmable logic devices (Programmable Logic Device, PLD), field programmable gate arrays (Field-Programmable Gate Array, FPGA), general purpose processors, controllers, microcontrollers, microprocessors, other electronic units configured to perform the functions described herein, or a combination thereof.

For a software implementation, the techniques described herein may be implemented with modules (e.g., procedures, functions, and so on) that perform the functions described herein. The software codes may be stored in a memory and executed by a processor. The memory may be implemented within the processor or external to the processor.

Optionally, as another embodiment, the processor 903 is further configured to perform the steps of the method of any of the preceding embodiments when the computer program is run.

Based on the composition of the data analysis device 80 and the computer storage medium described above, referring to fig. 10, a schematic diagram of the composition structure of a data analysis apparatus 90 provided in an embodiment of the present application is shown. As shown in fig. 10, the data analysis device 90 may include the data analysis apparatus 80 according to any of the foregoing embodiments.

In the embodiment of the present application, for the data analysis device 90, the time window information corresponding to the current time window is obtained; determining target label information and target rule information according to the time window information; and acquiring data to be processed corresponding to the target tag information from a preset database, and carrying out data analysis on the data to be processed according to the target rule information in a time period corresponding to the current time window to obtain an analysis result. In this way, the real-time data in the preset database and the time window information have an association relation, so that the data to be processed is selected based on the time window information and analyzed by adopting the rule corresponding to the time window information, asynchronous processing of data receiving and data analysis can be realized, and the method has good expansibility and is suitable for flexible and changeable scenes, thereby effectively improving the data analysis performance of the system and having higher practicability.

It should be noted that, in this application, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The foregoing embodiment numbers of the present application are merely for describing, and do not represent advantages or disadvantages of the embodiments.

The methods disclosed in the several method embodiments provided in the present application may be arbitrarily combined without collision to obtain a new method embodiment.

The features disclosed in the several product embodiments provided in the present application may be combined arbitrarily without conflict to obtain new product embodiments.

The features disclosed in the several method or apparatus embodiments provided in the present application may be arbitrarily combined without conflict to obtain new method embodiments or apparatus embodiments.

The foregoing is merely specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the present application, and the changes and substitutions are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method of data analysis, the method comprising:

acquiring time window information corresponding to a current time window;

2. The method of claim 1, wherein the obtaining the time window information corresponding to the current time window includes:

acquiring a current time window from a preset message queue;

and acquiring time window information corresponding to the current time window from the preset database based on the current time window.

3. The method according to claim 2, wherein the method further comprises:

after a timing task is started, a first time window is created according to a preset time interval, and first time window information corresponding to the first time window is written into the preset database;

acquiring second time window information from a preset cache region, and replacing the second time window information by using the first time window information and storing the second time window information in the preset cache region;

writing the second time window into the preset message queue;

the first time window information is the time window information which is newly created according to the timing task, and the second time window information is the time window information in the last use.

4. The method of claim 1, wherein the target tag information includes a target data type and a target window identification;

the obtaining the data to be processed corresponding to the target tag information from a preset database comprises the following steps:

5. The method according to any one of claims 1-4, further comprising:

receiving initial data to be analyzed;

performing array conversion on the initial data to obtain a node array;

acquiring a data type from the node array, and performing deserialization operation according to the data type to obtain at least one data table; each data table comprises a plurality of intermediate data;

updating the at least one data table according to the data type corresponding to each data table, and determining at least one candidate data table; each candidate data table comprises the plurality of intermediate data and window identifications corresponding to the intermediate data;

storing the at least one candidate data table to the preset database.

6. The method of claim 1, wherein the target rule information comprises a rule identification;

the data analysis is carried out on the data to be processed according to the target rule information to obtain an analysis result, and the method comprises the following steps:

7. The method according to claim 6, wherein the performing data analysis on the data to be processed according to the first rule information to obtain the analysis result includes:

performing rule matching processing on the data to be processed according to the first rule information;

8. The method according to any one of claims 1 to 7, wherein after performing data analysis on the data to be processed according to the target rule information to obtain an analysis result, the method further comprises:

9. The method of claim 1, wherein prior to said data analysis of said data to be processed according to said target rule information, said method further comprises:

10. A data analysis device, characterized in that the data analysis device comprises: an acquisition unit, a determination unit and an analysis unit, wherein,

11. A data analysis device, characterized in that the data analysis device comprises: a memory and a processor; wherein, the liquid crystal display device comprises a liquid crystal display device,

the processor being adapted to perform the method of any of claims 1 to 9 when the computer program is run.

12. A computer storage medium storing a data analysis program which, when executed by at least one processor, implements the method of any one of claims 1 to 9.