CN116069900A

CN116069900A - Data source identification method, device, system, electronic device and storage medium

Info

Publication number: CN116069900A
Application number: CN202211480352.2A
Authority: CN
Inventors: 秦向阳; 查超
Original assignee: DBAPPSecurity Co Ltd
Current assignee: DBAPPSecurity Co Ltd
Priority date: 2022-11-24
Filing date: 2022-11-24
Publication date: 2023-05-05

Abstract

The application relates to a data source identification method, a device, a system, an electronic device and a storage medium, wherein the data source identification method comprises the following steps: acquiring log data, and transmitting the log data to a message cluster to obtain log messages; acquiring a data source rule, and performing data source identification matching on the log message at least according to the data source rule to obtain a log identification result; and carrying out the same data source aggregation on the log identification result according to the data source of the log identification result, obtaining and storing the data source identification result. By the method and the device, the problems of low execution speed and low efficiency of data source division are solved, and efficient and accurate identification of the data source is realized.

Description

Data source identification method, device, system, electronic device and storage medium

Technical Field

The present application relates to the field of data processing, and in particular, to a data source identification method, device, system, electronic device, and storage medium.

Background

With the rapid development of technologies such as big data, cloud platforms, internet of things and the like, more and more data are produced and manufactured, and many of the data exist in a log form, so that a user needs to analyze a large amount of log data to realize the engineering value of the log data. However, since the data amount of the log data is too large, extraction and analysis of the log data become a heavy and complicated work.

In the related art, in order to better extract and analyze log data, the log data is generally classified according to the types of offline log data, so that the log data is classified and managed according to different types. However, under the technical background of big data and internet of things, log data is real-time streaming data, data sources are changed in a lot, data quantity is increased exponentially, the generation of a large amount of data can cause the ambiguity of data category division to be increased, the data offline data source division workload is large, the execution speed is slow, the efficiency is low, and the current engineering task requirements can not be met far.

Aiming at the problems of low execution speed and low efficiency of data source division in the related technology, no effective solution is proposed at present.

Disclosure of Invention

The embodiment provides a data source identification method, a device, a system, an electronic device and a storage medium, so as to solve the problems of low execution speed and low efficiency of data source division in the related technology.

In a first aspect, in this embodiment, there is provided a data source identification method, including:

acquiring log data, and transmitting the log data to a message cluster to obtain log messages;

acquiring a data source rule, and performing data source identification matching on the log message at least according to the data source rule to obtain a log identification result;

and carrying out the same data source aggregation on the log identification result according to the data source of the log identification result, obtaining and storing the data source identification result.

In some embodiments, the performing data source identification matching on the log message at least according to the data source rule to obtain a log identification result includes:

acquiring a sliding time window, acquiring a current window log message from the message cluster according to the sliding time window, and performing identification matching on the current window log message and the data source rule to obtain a current log identification result;

acquiring a next window log message according to the sliding time window, and identifying and matching the next window log message with the data source rule to obtain a next log identification result;

and obtaining the log recognition result according to the current log recognition result and the next log recognition result.

In some embodiments, the aggregating the log recognition result according to the same data source of the log recognition result to obtain and store the data source recognition result includes:

according to the sliding time window and the data source of the current log identification result, the same data source aggregation is carried out on the current log identification result, and the current data source identification result is obtained;

according to the sliding time window and the data source of the next log identification result, the same data source aggregation is carried out on the next log identification result, and the next data source identification result is obtained;

and obtaining the data source identification result according to the current data source identification result and the next data source identification result.

In some embodiments, the transmitting the log data to a message cluster to obtain a log message includes:

acquiring a log regular rule, and carrying out standardized processing on the log data according to the log regular rule to obtain standardized log data;

and transmitting the standardized log data to the message cluster to obtain the log message.

In some embodiments, the obtaining the data source rule at least performs data source identification matching on the log message according to the data source rule to obtain a log identification result, and includes:

acquiring a current data source rule, and storing the current data source rule into a data source rule catalog;

and monitoring the data source rule catalog, and under the condition of monitoring the next data source rule, carrying out data source identification matching on the log message at least according to the current data source rule and the next data source rule to obtain a log identification result.

In some of these embodiments, before the acquiring log data, further includes:

acquiring initialization configuration information of the message cluster, and starting a distributed coordination management service;

and synchronizing the initialization configuration information of the message cluster to the distributed coordination management service.

In a second aspect, in this embodiment, there is provided a data source identification apparatus, the apparatus including: the device comprises an acquisition module, an identification module and an aggregation module;

the acquisition module is used for acquiring log data and transmitting the log data to a message cluster to obtain log messages;

the identification module is used for acquiring a data source rule, and carrying out data source identification matching on the log message at least according to the data source rule to obtain a log identification result;

and the aggregation module is used for carrying out the same data source aggregation on the log identification result according to the data source of the log identification result, obtaining and storing the data source identification result.

In a third aspect, in this embodiment, there is provided a data source identification system, the system comprising: terminal equipment, transmission equipment and server equipment; the terminal equipment is connected with the server equipment through the transmission equipment;

the server device is configured to implement the data source identification method described in the first aspect;

the transmission device is used for sending the log data and the data source rule to the server device;

the terminal equipment is used for acquiring log data and data source rules.

In a fourth aspect, in this embodiment, there is provided an electronic device including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the data source identification method described in the first aspect.

In a fourth aspect, in this embodiment, there is provided a storage medium having stored thereon a computer program which, when executed by a processor, implements the data source identification method of the first aspect described above.

Compared with the related art, the data source identification method, the device, the system, the electronic device and the storage medium provided in the embodiment transmit the log data to the message cluster to obtain the log message by acquiring the log data; acquiring a data source rule, and performing data source identification matching on the log message at least according to the data source rule to obtain a log identification result; the same data source aggregation is carried out on the log identification result according to the data source of the log identification result, the data source identification result is obtained and stored, the problems of low execution speed and low efficiency of data source division are solved, and the rapid and accurate identification of the log data source is realized.

The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below to provide a more thorough understanding of the other features, objects, and advantages of the application.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute an undue limitation to the application. In the drawings:

FIG. 1 is an application scenario diagram of a data source identification method in one embodiment;

FIG. 2 is a flow chart of a method of data source identification in one embodiment;

FIG. 3 is a flow chart of a method of identifying data sources in another embodiment;

FIG. 4 is a flow chart of log collection and normalization steps in one embodiment;

FIG. 5 is a flow chart of a data source synchronization step in one embodiment;

FIG. 6 is a block diagram of a data source identification device in one embodiment;

fig. 7 is an internal structural diagram of a computer device in one embodiment.

Detailed Description

For a clearer understanding of the objects, technical solutions and advantages of the present application, the present application is described and illustrated below with reference to the accompanying drawings and examples.

Unless defined otherwise, technical or scientific terms used herein shall have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terms "a," "an," "the," "these," and the like in this application are not intended to be limiting in number, but rather are singular or plural. The terms "comprising," "including," "having," and any variations thereof, as used in the present application, are intended to cover a non-exclusive inclusion; for example, a process, method, and system, article, or apparatus that comprises a list of steps or modules (units) is not limited to the list of steps or modules (units), but may include other steps or modules (units) not listed or inherent to such process, method, article, or apparatus. The terms "connected," "coupled," and the like in this application are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. Reference to "a plurality" in this application means two or more. "and/or" describes an association relationship of an association object, meaning that there may be three relationships, e.g., "a and/or B" may mean: a exists alone, A and B exist together, and B exists alone. Typically, the character "/" indicates that the associated object is an "or" relationship. The terms "first," "second," "third," and the like, as referred to in this application, merely distinguish similar objects and do not represent a particular ordering of objects.

The data source identification method provided by the application can be applied to an application environment shown in fig. 1. Wherein the terminal device 102 communicates with the server device 104 via a network. The server device 104 obtains log data and transmits the log data to the message cluster to obtain log information; the server device 104 obtains the data source rule, and performs data source identification matching on the log message at least according to the data source rule to obtain a log identification result; the server device 104 performs the same data source aggregation on the log identification result according to the data source of the log identification result, and obtains and stores the data source identification result. The terminal device 102 may be, but not limited to, various personal computers, notebook computers, smartphones, tablet computers, and portable wearable devices, and the server device 104 may be implemented by a stand-alone server or a server cluster composed of a plurality of servers.

In this embodiment, a data source identification method is provided, where the data source identification method may be deployed on a large data stream computing engine of the server device 104, where the large data stream computing engine may be Apex, ballista, flink, spark Streaming or Storm, and preferably is deployed on a link; fig. 2 is a flowchart of the data source identification method of the present embodiment, and as shown in fig. 2, the flowchart includes the following steps:

step S202, obtaining log data, and transmitting the log data to a message cluster to obtain log information. The log data may be a terminal log, or may be a task or event log; the message cluster refers to a publish/subscribe based message system for big data, which may be kafka, pulsar or Pravega, for example; before the log data is transmitted to the message cluster, the message cluster needs to be started, and the message cluster is initialized, so that the consumption speed of kafka is improved by configuring a copy (Replica) and a Partition (Partition) of the message cluster according to the flow of the log data and the attribute information of the server device 104.

Specifically, taking kafka as an example of a message cluster identified by the data source, the server device 104 acquires log data, and starts the kafka service, opens the kafka message cluster, and configures a copy and partition of the kafka; in the case where the configuration of the kafka is completed, the log data is transmitted to the kafka message cluster, resulting in a log message.

Step S204, obtaining a data source rule, and carrying out data source identification matching on the log message at least according to the data source rule to obtain a log identification result. The data source rule is a preset data source rule or a data source rule input by a user; the data source rules may be related field association rules, or other rules that identify the data source; the data source identification matching means that the source of the log message is identified and matched according to the identification logic of the data source rule, and a data source field is attached; the data source field may be Linux, windows, a terminal, or other expandable device or system; the identified matches for the log message may be matched by row.

Specifically, taking a large data stream computing engine Flink as an example, a complex event processing (Complex Event Processing, CEP for short) module on the Flink acquires the data source rule and acquires a log message from the kafka message cluster; the CEP performs line identification matching on the log message at least according to the data source rule through an internal mode application program interface (Pattern Application Programming Interface, pattern API for short), and attaches a data source field, and after all the line identifications match, the log identification result is obtained.

Step S206, the same data source aggregation is carried out on the log identification result according to the data source of the log identification result, and the data source identification result is obtained and stored. The data source identification result may be stored in a columnar storage database, for example, HBase, clickHouse, druid or HP vertical, etc. Before the same data source aggregation is performed on the log identification result, the log identification result may be partitioned according to the sliding time window, and then the same data source aggregation is further performed in different sliding time windows, so that the sliding time window is used to filter data, and the accuracy of data source identification is improved.

Specifically, fig. 3 is a flowchart of another data source identification method according to the present embodiment, and as shown in fig. 3, the flowchart includes the following steps:

step S302, kafka is initialized. The server device 104 starts the message cluster, initializes the message cluster, configures a copy (Replica) and Partition (Partition) of the message cluster according to the flow of the log data and attribute information of the server device 104, and increases the consumption speed of kafka.

In step S304, the message cluster acquires log data. The server device 104 obtains log data and transmits the log data to the message cluster to obtain a log message.

Step S306, a data source rule is acquired. The server device 104 obtains the data source rule.

In step S308, the data source identifies a match. The server device 104 performs data source identification matching on the log message at least according to the data source rule, so as to obtain a log identification result.

Step S310, the same data sources are aggregated. The server device 104 performs the same data source aggregation on the log identification result according to the data source of the log identification result, and obtains the data source identification result.

Step S312, store to ClickHouse. Server device 104 stores the data source identification result in the ClickHouse.

Through the steps, the message clusters are utilized to carry out streaming processing on the real-time log data, so that the operation efficiency of a large amount of log data can be improved; according to the data source identification method, log data can be rapidly and efficiently partitioned, so that a user is helped to efficiently execute user behavior analysis, equipment abnormality analysis and early warning, single or multi-time sequence abnormality analysis and the like according to different log data sources, the data is driven to create greater value, and the problems of low data source partition execution speed and low efficiency are solved.

In some embodiments, performing data source identification matching on the log message at least according to the data source rule to obtain a log identification result, including:

acquiring a sliding time window, acquiring a current window log message from the message cluster according to the sliding time window, and carrying out data source identification matching on the current window log message and the data source rule to obtain a current log identification result;

acquiring a next window log message according to the sliding time window, and carrying out data source identification matching on the next window log message and the data source rule to obtain a next log identification result;

Wherein the sliding time window consists of a fixed window length and sliding interval, which may overlap, the window (window) may cut the unlimited stream into a limited-size bucket (bucket) for further processing, distributing the stream data. In this embodiment, the bucket refers to a set in the memory.

Specifically, the CEP module of the link on the server device 104 acquires a preset sliding time window, for example, the window length is 60s, and the interval is 10s; CEP obtains current window log information in current 60s from the information cluster according to the sliding time window, carries out data source identification matching on the current window log information according to the line and the data source rule, and obtains a current log identification result according to the matching result of each line; after 10s of interval, CEP obtains the next 60s of next window log information according to the sliding time window, and carries out data source identification matching on the next window log information and the data source rule to obtain the next log identification result; the server device 104 obtains the log recognition result according to the current log recognition result and the next log recognition result.

Through the steps, the data quantity of the acquired log information and the data quantity stored in the database after control identification can be limited by acquiring the sliding time window, so that the efficiency of log data analysis and batch insertion into the database is improved, and the current window is combined with part of historical data in the previous window for data source identification for multiple times by acquiring the overlapped sliding time window, so that the identification precision is improved, and the problems of low data source division execution speed, low efficiency and low precision are solved.

In some embodiments, the same data source aggregation is performed on the log identification result according to the data source of the log identification result, so as to obtain and store the data source identification result, including:

according to the sliding time window and the data source of the current log identification result, carrying out the same data source aggregation on the current log identification result to obtain the current data source identification result;

Through the steps, the log data of the same data source in a period of time can be further clarified by carrying out data source aggregation on the log identification result according to the sliding time window, and the data streams of different log data sources can be identified and distinguished secondarily, so that the accuracy of data source identification is improved, the data dimension is reduced, the processing efficiency is improved, and the problems of low data source division execution speed, low efficiency and low precision are solved.

In some of these embodiments, transmitting the log data into a message cluster to obtain a log message includes:

transmitting the standardized log data into the message cluster to obtain the log message.

The standardized log data refers to format data which can be cross-platform identified and loaded, for example, json format data, and the log regular rule is a regular expression which corresponds the log data with the json format data.

Specifically, fig. 4 is a schematic diagram of a log collection and normalization step in the present embodiment, as shown in fig. 4, step S402 is performed to analyze log data; specifically, the server device 104 obtains log data from the terminal and/or the device, and parses the log data to obtain machine language data that can be directly read by the server device 104; step S404, acquiring a log regular rule; specifically, the server device 104 acquires log regularization rules; step S406, log standardization; specifically, the server device 104 performs json standardization processing on the machine language data according to the log regularization rule to obtain standardized log data; step S408, transmitting to a message cluster; specifically, the server device 104 transmits the standardized log data into the message cluster kafka to obtain the log message.

Through the steps, the log data are converted into machine language data which can be directly read, the machine language data are converted into standardized data which can be cross-platform identified and loaded, the message clusters are utilized to conduct streaming processing on the real-time standardized log data, the operation efficiency of a large amount of log data can be improved, and the problems of low data source division execution speed and low efficiency are solved.

In some embodiments, obtaining a data source rule, performing data source identification matching on the log message at least according to the data source rule, to obtain a log identification result, including:

acquiring a current data source rule and storing the current data source rule into a data source rule catalog;

The data source rule catalog can be established in the distributed coordination management service, and the current data source rule and the next data source rule are stored in the data source rule catalog of the distributed coordination management service; the distributed coordination management service is deployed on the server device 104, and may be a Zookeeper; the monitoring of the data source rule list may be performed by a CEP module in the link on the server device 104, so that, when the CEP monitors a newly added next data source rule, the next data source rule is obtained, and the log message is subjected to data source matching according to the next data source rule.

Specifically, fig. 5 is a schematic diagram of a data source synchronization step in this embodiment, as shown in fig. 5, step S502 is performed to obtain a current data source rule; specifically, the server device 104 obtains the current data source rule; step S504, creating a current child node under a data source rule list of the Zookeeper; specifically, according to the current data source rule, a current child node is created under a data source rule list of a Zookeeper in the server device 104, and the child node stores the current data source rule; step S506, obtaining a next data source rule; specifically, the Zookeeper acquires the next data source rule; step S508, creating a next child node under the data source rule list of the Zookeeper; specifically, creating a next child node under the data source rule catalog, wherein the next child node stores a next data source rule; step S510, monitoring a data source rule catalog of a Zookeeper by a CEP module; specifically, the CEP module in the link on the server device 104 listens to the data source rule list of the Zookeeper; step S512, synchronizing the next data source rule to the memory of the CEP if the next data source rule or the next child node is monitored; specifically, the server device 104 synchronizes the next data source rule into the memory of the CEP; in step S514 of the process, obtaining a log identification result; specifically, the server device 104 performs data source identification matching on the log message according to the current data source rule, the next data source rule and the sliding time window, so as to obtain a log identification result.

Through the steps, the synchronization of the updated data source rule can be realized through the storage management of the data source rule by the distributed coordination management service and the monitoring mechanism of updating the data source rule by the Flink, the real-time synchronous tracking of the new data source rule is realized, the accuracy of data source matching identification is improved, and the problem of low data source division accuracy is solved.

In some of these embodiments, prior to the obtaining the log data, further comprising:

acquiring the message cluster initialization configuration information and starting a distributed coordination management service;

Through the steps, the initialization configuration information of the message cluster is synchronized to the distributed coordination management service, the distributed coordination management service can be used as a registration center to manage the message cluster, and meanwhile, the distributed coordination management service can be used as a controller for synchronizing the data source rules because the distributed coordination management service stores the data source rules, so that unified coordination management is carried out on the message cluster and the stream processing engine Flink, the data source identification is more orderly, and the data source identification order is improved.

It should be understood that, although the steps in the flowcharts of fig. 2-5 are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in fig. 2-5 may include multiple sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, nor do the order in which the sub-steps or stages are performed necessarily occur sequentially, but may be performed alternately or alternately with at least a portion of the sub-steps or stages of other steps or steps.

The embodiment also provides a data source identification device, which is used for implementing the above embodiment and the preferred implementation manner, and is not described in detail. The terms "module," "unit," "sub-unit," and the like as used below may refer to a combination of software and/or hardware that performs a predetermined function. While the means described in the following embodiments are preferably implemented in software, implementations in hardware, or a combination of software and hardware, are also possible and contemplated.

Fig. 6 is a block diagram showing the structure of a data source identifying apparatus of the present embodiment, as shown in fig. 6, comprising: an acquisition module 10, an identification module 20 and an aggregation module 30;

the acquiring module 10 is configured to acquire log data, and transmit the log data to a message cluster to obtain a log message;

the identification module 20 is configured to obtain a data source rule, and perform data source identification matching on the log message at least according to the data source rule, so as to obtain a log identification result;

the aggregation module 30 is configured to aggregate the same data sources of the log identification result according to the data sources of the log identification result, obtain the data source identification result, and store the data source identification result.

For specific limitations of the data source identification means, reference may be made to the above limitations of the data source identification method, and no further description is given here. The respective modules in the above-described data source identification apparatus may be implemented in whole or in part by software, hardware, and a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

There is also provided in this embodiment a data source identification system including: a terminal device 102, a transmission device, and a server device 104; wherein the terminal device 102 is connected to the server device 104 through a transmission device;

the server device 104 is configured to perform the steps of any of the method embodiments described above;

the terminal device 102 is configured to obtain log data and data source rules.

There is also provided in this embodiment an electronic device comprising a memory having stored therein a computer program and a processor arranged to run the computer program to perform the steps of any of the method embodiments described above.

Optionally, the electronic apparatus may further include a transmission device and an input/output device, where the transmission device is connected to the processor, and the input/output device is connected to the processor.

Alternatively, in the present embodiment, the above-described processor may be configured to execute the following steps by a computer program:

s1, acquiring log data, and transmitting the log data to a message cluster to obtain log messages.

S2, acquiring a data source rule, and carrying out data source identification and matching on the log message at least according to the data source rule to obtain a log identification result.

And S3, carrying out the same data source aggregation on the log identification result according to the data source of the log identification result, obtaining and storing the data source identification result.

It should be noted that, specific examples in this embodiment may refer to examples described in the foregoing embodiments and alternative implementations, and are not described in detail in this embodiment.

In addition, in combination with the data source identification method provided in the above embodiment, a storage medium may be provided in this embodiment. The storage medium has a computer program stored thereon; the computer program, when executed by a processor, implements any of the data source identification methods of the above embodiments.

It should be noted that, user information (including but not limited to user equipment information, user personal information, etc.) and data (including but not limited to data for analysis, stored data, presented data, etc.) referred to in the present application are information and data authorized by the user or sufficiently authorized by each party.

In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 7. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used for storing data source identification result data. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a data source identification method.

It will be appreciated by those skilled in the art that the structure shown in fig. 7 is merely a block diagram of some of the structures associated with the present application and is not limiting of the computer device to which the present application may be applied, and that a particular computer device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the various embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.

It should be understood that the specific embodiments described herein are merely illustrative of this application, and are not intended to be limiting. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present application, are within the scope of the present application in light of the embodiments provided herein.

It is evident that the drawings are only examples or embodiments of the present application, from which the present application can also be adapted to other similar situations by a person skilled in the art without the inventive effort. In addition, it should be appreciated that while the development effort might be complex and lengthy, it would nevertheless be a routine undertaking of design, fabrication, or manufacture for those of ordinary skill having the benefit of this disclosure, and thus should not be construed as an admission of insufficient detail.

The term "embodiment" in this application means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive. It will be clear or implicitly understood by those of ordinary skill in the art that the embodiments described in this application can be combined with other embodiments without conflict.

The above examples only represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the patent. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application shall be subject to the appended claims.

Claims

1. A method of data source identification, comprising:

2. The method for identifying a data source according to claim 1, wherein the performing data source identification matching on the log message at least according to the data source rule to obtain a log identification result includes:

3. The method for identifying data sources according to claim 2, wherein the step of performing the same data source aggregation on the log identification result according to the data source of the log identification result to obtain and store the data source identification result includes:

4. The method for identifying a data source according to claim 1, wherein said transmitting the log data to a message cluster to obtain a log message comprises:

5. The method for identifying a data source according to claim 1, wherein the obtaining the data source rule, performing data source identification matching on the log message at least according to the data source rule, and obtaining a log identification result, includes:

6. The data source identification method according to any one of claims 1 to 5, characterized by further comprising, before the acquisition of log data:

7. A data source identification device, comprising: the device comprises an acquisition module, an identification module and an aggregation module;

8. A data source identification system, comprising: terminal equipment, transmission equipment and server equipment; the terminal equipment is connected with the server equipment through the transmission equipment;

the server device being configured to perform the data source identification method of any one of claims 1 to 6;

the terminal equipment is used for acquiring log data and data source rules.

9. An electronic device comprising a memory and a processor, wherein the memory has stored therein a computer program, the processor being arranged to run the computer program to perform the data source identification method of any of claims 1 to 6.

10. A computer readable storage medium having stored thereon a computer program, characterized in that the computer program when executed by a processor realizes the steps of the data source identification method of any of claims 1 to 6.